History log of /netbsd-current/sys/kern/kern_synch.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.366 22-Nov-2023 riastradh

kpause(9): KASSERT -> KASSERTMSG

PR kern/57718 (might help to diagnose manifestations of the problem)


Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.365 15-Oct-2023 riastradh

kern_synch.c: Sort includes. No functional change intended.


# 1.364 15-Oct-2023 riastradh

sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.


# 1.363 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.362 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.361 04-Oct-2023 ad

Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.


# 1.360 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.359 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.358 17-Jul-2023 riastradh

kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h


# 1.357 13-Jul-2023 riastradh

kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-0-RC1 netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.365 15-Oct-2023 riastradh

kern_synch.c: Sort includes. No functional change intended.


# 1.364 15-Oct-2023 riastradh

sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.


# 1.363 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.362 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.361 04-Oct-2023 ad

Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.


# 1.360 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.359 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.358 17-Jul-2023 riastradh

kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h


# 1.357 13-Jul-2023 riastradh

kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.363 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.362 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.361 04-Oct-2023 ad

Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.


# 1.360 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.359 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.358 17-Jul-2023 riastradh

kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h


# 1.357 13-Jul-2023 riastradh

kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.360 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.359 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.358 17-Jul-2023 riastradh

kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h


# 1.357 13-Jul-2023 riastradh

kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.358 17-Jul-2023 riastradh

kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h


# 1.357 13-Jul-2023 riastradh

kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.357 13-Jul-2023 riastradh

kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.356 23-Jun-2023 riastradh

tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.


# 1.355 23-Jun-2023 riastradh

tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.354 09-Apr-2023 riastradh

kpause(9): Simplify assertion. No functional change intended.


Revision tags: netbsd-10-base
# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.353 05-Dec-2022 martin

If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.352 26-Oct-2022 riastradh

kern/kern_synch.c: Get averunnable from sys/resource.h.


Revision tags: bouyer-sunxi-drm-base
# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.351 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.350 10-Mar-2022 riastradh

kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.349 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.348 20-May-2020 maxv

future-proof-ness


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: bouyer-xenpvh-base1
# 1.347 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


Revision tags: phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.346 04-Apr-2020 ad

branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.346 04-Apr-2020 ad

preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.345 26-Mar-2020 ad

Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.344 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


# 1.343 14-Mar-2020 ad

- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.


Revision tags: ad-namecache-base3
# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.342 23-Feb-2020 ad

kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.


# 1.341 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.340 16-Feb-2020 ad

nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.339 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


Revision tags: ad-namecache-base2
# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.338 24-Jan-2020 ad

Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.337 22-Jan-2020 ad

- DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com


Revision tags: ad-namecache-base1
# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.336 09-Jan-2020 ad

- Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.335 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.334 21-Dec-2019 ad

schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.334 21-Dec-2019 ad

schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.


# 1.333 20-Dec-2019 ad

Use CPU_COUNT() to update nswtch. No functional change.


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.332 16-Dec-2019 ad

kpreempt_disabled(): softint LWPs aren't preemptable.


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.331 07-Dec-2019 ad

mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.


# 1.330 07-Dec-2019 ad

mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.329 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.328 03-Dec-2019 riastradh

Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.327 01-Dec-2019 ad

Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.326 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.325 21-Nov-2019 ad

- Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.


Revision tags: phil-wifi-20191119
# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.324 03-Oct-2019 kamil

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.323 03-Feb-2019 mrg

- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base
# 1.323 03-Feb-2019 mrg

- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.322 30-Nov-2018 mlelstv

The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.


# 1.321 28-Nov-2018 mlelstv

Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.


# 1.320 28-Nov-2018 mlelstv

Revert previous for a better fix.


# 1.319 28-Nov-2018 mlelstv

Fix statistics in case mi_switch didn't actually switch LWPs.


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.318 14-Aug-2018 ozaki-r

Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.


Revision tags: pgoyette-compat-0728
# 1.317 24-Jul-2018 bouyer

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.


# 1.316 12-Jul-2018 maxv

Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.315 19-May-2018 jdolecek

Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.314 16-Feb-2018 ozaki-r

branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.313 30-Jan-2018 ozaki-r

Apply C99-style struct initialization to syncobj_t


Revision tags: tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: matt-nb8-mediatek-base perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.312 06-Aug-2017 christos

use the same string for the log and uprintf.


Revision tags: perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.311 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.310 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.309 13-Oct-2015 pgoyette

When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2


Revision tags: netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.308 28-Feb-2014 skrll

branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes


# 1.307 15-Sep-2013 martin

Remove __CT_LOCAL_.. hack


# 1.306 14-Sep-2013 martin

Guard a function local CTASSERT with prologue/epilogue


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.305 02-Sep-2012 mlelstv

branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.


# 1.304 30-Aug-2012 matt

Add a new more KASSERT/KASSERTMSG


# 1.303 18-Aug-2012 christos

PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.


# 1.302 27-Jul-2012 matt

Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.301 21-Apr-2012 rmind

Improve the assert message.


# 1.300 18-Apr-2012 yamt

comment


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base4
# 1.299 03-Mar-2012 matt

If IPL_SAFEPRI is defined, use it to initialize safepri.


Revision tags: jmcneill-usbmp-base5 jmcneill-usbmp-base3
# 1.298 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: jmcneill-usbmp-base2 netbsd-6-base
# 1.297 28-Jan-2012 rmind

branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2
# 1.296 06-Nov-2011 dholland

branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@


Revision tags: yamt-pagecache-base
# 1.295 05-Oct-2011 njoly

branches: 1.295.2;
Include sys/syslog.h for log(9).


# 1.294 05-Oct-2011 apb

revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().


# 1.293 05-Oct-2011 apb

When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.


# 1.292 05-Oct-2011 apb

Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.


# 1.291 27-Sep-2011 jym

Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html


# 1.290 30-Jul-2011 christos

Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.289 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly.


# 1.288 02-May-2011 rmind

Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.


# 1.287 14-Apr-2011 matt

Add an assert to make sure no unexpected spinlocks are held in mi_switch


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base
# 1.286 03-Jan-2011 pooka

branches: 1.286.2;
update comment


Revision tags: matt-mips64-premerge-20101231
# 1.285 18-Dec-2010 rmind

mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.


Revision tags: uebayasi-xip-base4
# 1.284 02-Nov-2010 pooka

KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.283 30-Apr-2010 martin

Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync


Revision tags: uebayasi-xip-base1
# 1.282 20-Apr-2010 rmind

sched_pstats: fix previous, exclude system/softintr threads from loadavg.


# 1.281 16-Apr-2010 rmind

- Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.


Revision tags: yamt-nfs-mp-base9
# 1.280 03-Mar-2010 yamt

branches: 1.280.2;
remove redundant checks of PK_MARKER.


# 1.279 23-Feb-2010 darran

DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.


# 1.278 21-Feb-2010 darran

DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.


# 1.277 21-Feb-2010 darran

Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).


# 1.276 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


# 1.275 18-Feb-2010 skrll

Fix comment(s).

OK'ed by rmind


Revision tags: uebayasi-xip-base
# 1.274 30-Dec-2009 rmind

branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.


Revision tags: matt-premerge-20091211
# 1.273 05-Dec-2009 pooka

tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.


# 1.272 05-Dec-2009 pooka

Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.


Revision tags: jym-xensuspend-nbase
# 1.271 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.270 03-Oct-2009 elad

- Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.


# 1.269 03-Oct-2009 elad

Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!


# 1.268 03-Oct-2009 elad

Move sched policy back to the subsystem.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.267 19-Jul-2009 yamt

set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.


Revision tags: yamt-nfs-mp-base6
# 1.266 29-Jun-2009 yamt

update a comment


# 1.265 28-Jun-2009 rmind

Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.264 16-Apr-2009 ad

kpreempt: fix another bug, uintptr_t -> bool truncation.


# 1.263 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.262 15-Apr-2009 yamt

kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.


# 1.261 28-Mar-2009 rmind

- kpreempt_disabled: constify l.
- Few predictions.
- KNF.


Revision tags: nick-hppapmap-base2
# 1.260 04-Feb-2009 ad

branches: 1.260.2;
Warn once and no more about backwards monotonic clock.


# 1.259 28-Jan-2009 rmind

sched_pstats: add few checks to catch the problem. OK by <ad>.


Revision tags: mjf-devfs2-base
# 1.258 21-Dec-2008 ad

Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.


# 1.257 20-Dec-2008 ad

Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.


# 1.256 13-Dec-2008 ad

PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.255 15-Nov-2008 skrll

s/process/LWP/ in comments where appropriate.


Revision tags: netbsd-5-0-RC1 netbsd-5-base
# 1.254 29-Oct-2008 smb

branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....


# 1.253 29-Oct-2008 skrll

Typo in comment.


Revision tags: matt-mips64-base2 haad-dm-base1
# 1.252 15-Oct-2008 wrstuden

branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.251 25-Jul-2008 uwe

Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.


# 1.250 02-Jul-2008 rmind

branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.


# 1.249 02-Jul-2008 rmind

Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 yamt-pf42-base3 wrstuden-revivesa-base
# 1.248 31-May-2008 ad

branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.247 29-May-2008 ad

lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.


# 1.246 29-May-2008 rmind

Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.


# 1.245 27-May-2008 ad

Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.


# 1.244 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.243 19-May-2008 ad

Reduce ifdefs due to MULTIPROCESSOR slightly.


# 1.242 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.241 30-Apr-2008 ad

branches: 1.241.2;
Avoid unneeded AST faults.


# 1.240 30-Apr-2008 ad

kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.


# 1.239 30-Apr-2008 ad

Reapply 1.235 which was lost with a subsequent merge.


# 1.238 29-Apr-2008 ad

Ignore processes with PK_MARKER set.


# 1.237 29-Apr-2008 rmind

Split the runqueue management code into the separate file.
OK by <ad>.


# 1.236 29-Apr-2008 ad

Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.


# 1.235 28-Apr-2008 ad

EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC


# 1.234 28-Apr-2008 ad

Make the preemption switch a __HAVE instead of an option.


# 1.233 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


# 1.232 28-Apr-2008 ad

Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.


# 1.231 28-Apr-2008 ad

Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.


Revision tags: yamt-nfs-mp-base
# 1.230 27-Apr-2008 ad

branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.


# 1.229 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.228 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.227 13-Apr-2008 yamt

branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.


# 1.226 13-Apr-2008 yamt

sched_print_runqueue: fix printf formats.


# 1.225 13-Apr-2008 dogcow

Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.


# 1.224 12-Apr-2008 ad

Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).


# 1.223 12-Apr-2008 ad

Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.222 02-Apr-2008 ad

yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.221 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


# 1.220 16-Mar-2008 rmind

Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.


# 1.219 12-Mar-2008 ad

Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.


# 1.218 11-Mar-2008 ad

Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.217 14-Feb-2008 ad

branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.216 15-Jan-2008 rmind

Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.


Revision tags: matt-armv6-base
# 1.215 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.214 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.213 27-Dec-2007 ad

sched_pstats: need proclist_mutex to send signals.


Revision tags: vmlocking2-base3
# 1.212 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-pm-base reinoud-bufcleanup-base
# 1.211 03-Dec-2007 ad

branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.


Revision tags: vmlocking-nbase
# 1.210 03-Dec-2007 ad

For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).


# 1.209 02-Dec-2007 ad

- mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.


# 1.208 29-Nov-2007 ad

cv_init(&lbolt, "lbolt");


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.207 12-Nov-2007 ad

Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.206 10-Nov-2007 ad

Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.


# 1.205 06-Nov-2007 ad

Fix merge error. Spotted by rmind@.


Revision tags: jmcneill-base
# 1.204 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.203 04-Nov-2007 rmind

branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.


# 1.202 29-Oct-2007 yamt

reduce dependencies on opt_sched.h.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3
# 1.201 13-Oct-2007 rmind

branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.


Revision tags: vmlocking-base
# 1.200 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


# 1.199 08-Oct-2007 ad

Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.


Revision tags: yamt-x86pmap-base2
# 1.198 03-Oct-2007 ad

- sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.


# 1.197 02-Oct-2007 ad

Fix assertion that broke debug kernels.


# 1.196 01-Oct-2007 ad

Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.


# 1.195 25-Sep-2007 ad

curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base matt-mips64-base
# 1.194 06-Aug-2007 yamt

branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.


# 1.193 04-Aug-2007 ad

Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.


# 1.192 02-Aug-2007 rmind

branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..


# 1.191 01-Aug-2007 ad

Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.190 09-Jul-2007 ad

branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.189 31-May-2007 ad

setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.


# 1.188 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.187 11-Mar-2007 ad

branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.186 04-Mar-2007 christos

branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.185 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.184 26-Feb-2007 yamt

implement priority inheritance.


# 1.183 23-Feb-2007 ad

setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.


# 1.182 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.181 19-Feb-2007 dsl

Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.


# 1.180 18-Feb-2007 dsl

Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.


# 1.179 18-Feb-2007 dsl

Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.


# 1.178 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.177 15-Feb-2007 ad

branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.176 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


# 1.175 10-Feb-2007 christos

avoid using struct proc in the perfctrs case, where the variable might
not be used.


Revision tags: post-newlock2-merge
# 1.174 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.173 03-Nov-2006 ad

branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.


# 1.172 02-Nov-2006 yamt

ltsleep: fix a race with wakeup().


# 1.171 01-Nov-2006 yamt

remove some __unused from function parameters.


# 1.170 01-Nov-2006 yamt

kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.


# 1.169 01-Nov-2006 yamt

mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.


Revision tags: yamt-splraiseipl-base2
# 1.168 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.167 07-Sep-2006 mrg

branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.


Revision tags: yamt-pdpolicy-base8
# 1.166 02-Sep-2006 christos

branches: 1.166.2;
deal with empty if bodies


# 1.165 30-Aug-2006 tsutsui

Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.


# 1.164 17-Aug-2006 christos

Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.163 08-Jul-2006 matt

Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).


Revision tags: yamt-pdpolicy-base6
# 1.162 24-Jun-2006 mrg

don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.


Revision tags: chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.161 14-May-2006 elad

branches: 1.161.4;
integrate kauth.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.160 27-Dec-2005 chs

branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.


# 1.159 26-Dec-2005 perry

u_intN_t -> uintN_t


# 1.158 24-Dec-2005 perry

Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.157 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.156 20-Dec-2005 rpaulo

Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.


# 1.155 15-Dec-2005 yamt

updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.


# 1.154 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.153 01-Nov-2005 yamt

make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


# 1.152 30-Oct-2005 yamt

- localize some definitions.
- use PPQ macro where appropriate.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.151 06-Oct-2005 yamt

branches: 1.151.2;
uninline scheduler hooks.


# 1.150 02-Oct-2005 chs

avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.


# 1.149 29-May-2005 christos

branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.148 02-Mar-2005 mycroft

branches: 1.148.2;
Copyright maintenance.


# 1.147 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.146 09-Dec-2004 matt

branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.


Revision tags: kent-audio1-base
# 1.145 01-Oct-2004 yamt

introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.144 18-May-2004 yamt

use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.


# 1.143 12-May-2004 yamt

use callout_schedule() for schedcpu().


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.142 14-Mar-2004 cl

add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP


# 1.141 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.140 04-Jan-2004 kleink

; may be a comment character in assembly, use \n as a separator instead.


# 1.139 02-Nov-2003 cl

Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.


# 1.138 26-Oct-2003 fvdl

Fix (bogus) unitialized variable warning.


# 1.137 08-Sep-2003 itojun

truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html


# 1.136 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.135 28-Jul-2003 matt

Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).


# 1.134 18-Jul-2003 matt

Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.


# 1.133 17-Jul-2003 fvdl

Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.


# 1.132 29-Jun-2003 fvdl

branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.131 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.130 26-Jun-2003 nathanw

Whitespace police.


# 1.129 26-Jun-2003 nathanw

For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.


# 1.128 20-May-2003 simonb

Sprinkle a little white-space.


# 1.127 08-May-2003 matt

In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.


# 1.126 04-Feb-2003 pk

ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).


# 1.125 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.124 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.123 21-Jan-2003 christos

step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.


# 1.122 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge nathanw_sa_base
# 1.121 15-Jan-2003 thorpej

Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.


# 1.120 12-Jan-2003 pk

schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().


Revision tags: fvdl_fs64_base
# 1.119 29-Dec-2002 thorpej

* Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.


# 1.118 29-Dec-2002 thorpej

Add a comment about affinity to awaken().


# 1.117 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.116 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.115 03-Nov-2002 nisimura

branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.114 29-Sep-2002 gmcgarry

Back out __HAVE_CHOOSEPROC stuff.


# 1.113 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.112 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: gehenna-devsw-base
# 1.111 07-Aug-2002 briggs

Only include sys/pmc.h if PERFCTRS is defined.


# 1.110 07-Aug-2002 briggs

Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.


# 1.109 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base
# 1.108 21-May-2002 thorpej

Move kernel_lock manipulation info functions so that they will
show up in a profile.


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.107 30-Nov-2001 kleink

branches: 1.107.4; 1.107.8;
asm -> __asm.


Revision tags: thorpej-mips-cache-base
# 1.106 12-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.105 25-Sep-2001 chs

branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.104 28-May-2001 chs

branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.


# 1.103 27-Apr-2001 jdolecek

Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.102 20-Apr-2001 thorpej

Make sure there is there is a curproc in ltsleep().


# 1.101 14-Jan-2001 thorpej

branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.


# 1.100 01-Jan-2001 sommerfeld

MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.


# 1.99 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.98 12-Nov-2000 jdolecek

use SIGACTION() macro to get on appropriate sigaction
structure


# 1.97 23-Sep-2000 enami

Stop runnable but swapped out user processes also in suspendsched().


# 1.96 15-Sep-2000 enami

The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.


# 1.95 14-Sep-2000 thorpej

Make sure to lock the proclist when we're traversing allproc.


# 1.94 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.93 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.92 01-Sep-2000 bouyer

wakeup()->sched_wakeup()


# 1.91 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.90 26-Aug-2000 sommerfeld

Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().


# 1.89 26-Aug-2000 sommerfeld

On second thought.. pass cpu_info * to roundrobin() explicitly.


# 1.88 26-Aug-2000 sommerfeld

More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.


# 1.87 25-Aug-2000 thorpej

Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.


# 1.86 24-Aug-2000 thorpej

Correct a comment.


# 1.85 24-Aug-2000 sommerfeld

Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().


# 1.84 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.83 20-Aug-2000 thorpej

Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.


# 1.82 07-Aug-2000 thorpej

Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.


# 1.81 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


# 1.80 02-Aug-2000 nathanw

principal -> principle (in a comment)


# 1.79 27-Jun-2000 mrg

remove include of <vm/vm.h>


Revision tags: netbsd-1-5-base
# 1.78 10-Jun-2000 sommerfeld

branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.


# 1.77 08-Jun-2000 thorpej

Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.76 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


Revision tags: minoura-xpg4dl-base
# 1.75 27-May-2000 thorpej

branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.


# 1.74 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.73 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.72 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.71 30-Mar-2000 augustss

Get rid of register declarations.


# 1.70 28-Mar-2000 simonb

endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().


# 1.69 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.68 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.67 15-Nov-1999 fvdl

Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.66 14-Oct-1999 ross

branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.


# 1.65 17-Sep-1999 thorpej

branches: 1.65.2;
Centralize the declaration and clearing of `cold'.


# 1.64 15-Sep-1999 thorpej

Be slightly more informative in the tsleep() diagnostics.


Revision tags: chs-ubc2-base
# 1.63 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.62 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.61 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.60 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.59 21-Apr-1999 mrg

revert previous. oops.


# 1.58 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.57 24-Mar-1999 mrg

branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.56 28-Feb-1999 ross

schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods


# 1.55 23-Feb-1999 ross

Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.54 04-Nov-1998 chs

LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.


# 1.53 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


Revision tags: eeh-paddr_t-base
# 1.52 04-Jul-1998 jonathan

defopt DDB.


# 1.51 25-Jun-1998 thorpej

defopt KTRACE


# 1.50 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.49 12-Feb-1998 kleink

Fix variable declarations: register -> register int.


# 1.48 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.47 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.46 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.45 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.44 07-May-1997 gwr

branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c


Revision tags: is-newarp-before-merge is-newarp-base
# 1.43 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().


# 1.42 15-Oct-1996 cgd

reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.


# 1.41 13-Oct-1996 christos

backout previous kprintf change


# 1.40 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.39 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.38 17-Jul-1996 explorer

Add compile-time and run-time control over automatic niceing


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.37 22-Apr-1996 christos

branches: 1.37.4;
remove include of <sys/cpu.h>


# 1.36 30-Mar-1996 christos

Fix db_printf formats.


# 1.35 09-Feb-1996 christos

More proto fixes


# 1.34 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.33 08-Jun-1995 mycroft

Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.


# 1.32 22-Apr-1995 christos

- new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.


# 1.31 19-Mar-1995 mycroft

Use %p.


# 1.30 30-Oct-1994 cgd

be more careful with types, also pull in headers where necessary.


# 1.29 30-Aug-1994 mycroft

Display emulation type.


# 1.28 30-Aug-1994 mycroft

Clean up some debugging code.


# 1.27 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.26 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.25 18-May-1994 cgd

mostly-machine-indepedent switch, and changes to match. also, hack init_main


# 1.24 14-May-1994 glass

missing rcsid


# 1.23 13-May-1994 cgd

setrq -> setrunqueue, sched -> scheduler


# 1.22 07-May-1994 cgd

function name changes


# 1.21 06-May-1994 mycroft

Put some more code in splstatclock(), just to be safe.


# 1.20 05-May-1994 mycroft

Now setpri() is really toast.


# 1.19 05-May-1994 mycroft

setpri() is toast.


# 1.18 05-May-1994 mycroft

Remove now-bogus casts.


# 1.17 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.16 04-May-1994 cgd

Rename a lot of process flags.


# 1.15 29-Apr-1994 cgd

change timeout/untimeout/wakeup/sleep/tsleep args to void *


# 1.14 22-Dec-1993 cgd

cast to match header (changed back...)


# 1.13 20-Dec-1993 cgd

load average changes from magnum


# 1.12 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.11 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


# 1.10 29-Aug-1993 cgd

branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.9 15-Jul-1993 brezak

Add 'ps' command. Add -more- pager to output from Mach ddb.


# 1.8 27-Jun-1993 andrew

#endif was somehow missing from the end of a DDB conditional!


# 1.7 27-Jun-1993 andrew

ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.6 27-Jun-1993 glass

another NDDB -> DDB change. why did DDB invade kern/*?


# 1.5 20-May-1993 cgd

add $Id$ strings, and clean up file headers where necessary


# 1.4 15-Apr-1993 glass

i hate NDDB......


Revision tags: netbsd-0-8 netbsd-alpha-1
# 1.3 10-Apr-1993 glass

fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)


Revision tags: patchkit-0-2-2
# 1.2 21-Mar-1993 cgd

after 0.2.2 "stable" patches applied


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision