History log of /openbsd-current/sys/kern/kern_resource.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.84 03-Jun-2024 claudio

Remove the now unsued s argument to SCHED_LOCK and SCHED_UNLOCK.

The SPL level is not tacked by the mutex and we no longer need to track
this in the callers.
OK miod@ mlarkin@ tb@ jca@


# 1.83 22-May-2024 claudio

Just grab the SCHED_LOCK() once in donice() before walking the ps_threads
list. setpriority() is trivial and probably faster than releasing and
relocking SCHED_LOCK().
OK jca@


# 1.82 20-May-2024 claudio

Rework interaction between sleep API and exit1() and start unlocking ps_threads

This diff adjusts how single_thread_set() accounts the threads by using
ps_threadcnt as initial value and counting all threads out that are already
parked. In single_thread_check call exit1() before decreasing ps_singlecount
this is now done in exit1().

exit1() and thread_fork() ensure that ps_threadcnt is updated with the
pr->ps_mtx held and in exit1() also account for exiting threads since
exit1() can sleep.

OK mpi@


# 1.81 17-Apr-2024 claudio

dogetrusage() must be called with the KERNEL_LOCK held for now.
OK mpi@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.80 13-Sep-2023 claudio

Revert commitid: yfAefyNWibUyjkU2, ESyyH5EKxtrXGkS6 and itscfpFvJLOj8mHB;

The change to the single thread API results in crashes inside exit1()
as found by Syzkaller. There seems to be a race in the exit codepath.
What exactly fails is not really clear therefor revert for now.

This should fix the following Syzkaller reports:
Reported-by: syzbot+38efb425eada701ca8bb@syzkaller.appspotmail.com
Reported-by: syzbot+ecc0e8628b3db39b5b17@syzkaller.appspotmail.com
and maybe more.

Reverted commits:


# 1.79 08-Sep-2023 claudio

Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.

The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.

Tested by phessler@, ok mpi@


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.83 22-May-2024 claudio

Just grab the SCHED_LOCK() once in donice() before walking the ps_threads
list. setpriority() is trivial and probably faster than releasing and
relocking SCHED_LOCK().
OK jca@


# 1.82 20-May-2024 claudio

Rework interaction between sleep API and exit1() and start unlocking ps_threads

This diff adjusts how single_thread_set() accounts the threads by using
ps_threadcnt as initial value and counting all threads out that are already
parked. In single_thread_check call exit1() before decreasing ps_singlecount
this is now done in exit1().

exit1() and thread_fork() ensure that ps_threadcnt is updated with the
pr->ps_mtx held and in exit1() also account for exiting threads since
exit1() can sleep.

OK mpi@


# 1.81 17-Apr-2024 claudio

dogetrusage() must be called with the KERNEL_LOCK held for now.
OK mpi@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.80 13-Sep-2023 claudio

Revert commitid: yfAefyNWibUyjkU2, ESyyH5EKxtrXGkS6 and itscfpFvJLOj8mHB;

The change to the single thread API results in crashes inside exit1()
as found by Syzkaller. There seems to be a race in the exit codepath.
What exactly fails is not really clear therefor revert for now.

This should fix the following Syzkaller reports:
Reported-by: syzbot+38efb425eada701ca8bb@syzkaller.appspotmail.com
Reported-by: syzbot+ecc0e8628b3db39b5b17@syzkaller.appspotmail.com
and maybe more.

Reverted commits:


# 1.79 08-Sep-2023 claudio

Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.

The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.

Tested by phessler@, ok mpi@


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.82 20-May-2024 claudio

Rework interaction between sleep API and exit1() and start unlocking ps_threads

This diff adjusts how single_thread_set() accounts the threads by using
ps_threadcnt as initial value and counting all threads out that are already
parked. In single_thread_check call exit1() before decreasing ps_singlecount
this is now done in exit1().

exit1() and thread_fork() ensure that ps_threadcnt is updated with the
pr->ps_mtx held and in exit1() also account for exiting threads since
exit1() can sleep.

OK mpi@


# 1.81 17-Apr-2024 claudio

dogetrusage() must be called with the KERNEL_LOCK held for now.
OK mpi@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.80 13-Sep-2023 claudio

Revert commitid: yfAefyNWibUyjkU2, ESyyH5EKxtrXGkS6 and itscfpFvJLOj8mHB;

The change to the single thread API results in crashes inside exit1()
as found by Syzkaller. There seems to be a race in the exit codepath.
What exactly fails is not really clear therefor revert for now.

This should fix the following Syzkaller reports:
Reported-by: syzbot+38efb425eada701ca8bb@syzkaller.appspotmail.com
Reported-by: syzbot+ecc0e8628b3db39b5b17@syzkaller.appspotmail.com
and maybe more.

Reverted commits:


# 1.79 08-Sep-2023 claudio

Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.

The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.

Tested by phessler@, ok mpi@


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.81 17-Apr-2024 claudio

dogetrusage() must be called with the KERNEL_LOCK held for now.
OK mpi@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.80 13-Sep-2023 claudio

Revert commitid: yfAefyNWibUyjkU2, ESyyH5EKxtrXGkS6 and itscfpFvJLOj8mHB;

The change to the single thread API results in crashes inside exit1()
as found by Syzkaller. There seems to be a race in the exit codepath.
What exactly fails is not really clear therefor revert for now.

This should fix the following Syzkaller reports:
Reported-by: syzbot+38efb425eada701ca8bb@syzkaller.appspotmail.com
Reported-by: syzbot+ecc0e8628b3db39b5b17@syzkaller.appspotmail.com
and maybe more.

Reverted commits:


# 1.79 08-Sep-2023 claudio

Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.

The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.

Tested by phessler@, ok mpi@


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.80 13-Sep-2023 claudio

Revert commitid: yfAefyNWibUyjkU2, ESyyH5EKxtrXGkS6 and itscfpFvJLOj8mHB;

The change to the single thread API results in crashes inside exit1()
as found by Syzkaller. There seems to be a race in the exit codepath.
What exactly fails is not really clear therefor revert for now.

This should fix the following Syzkaller reports:
Reported-by: syzbot+38efb425eada701ca8bb@syzkaller.appspotmail.com
Reported-by: syzbot+ecc0e8628b3db39b5b17@syzkaller.appspotmail.com
and maybe more.

Reverted commits:


# 1.79 08-Sep-2023 claudio

Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.

The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.

Tested by phessler@, ok mpi@


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.79 08-Sep-2023 claudio

Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.

The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.

Tested by phessler@, ok mpi@


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.78 29-Aug-2023 claudio

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
the tree.

OK cheloha@ dlg@


Revision tags: OPENBSD_7_3_BASE
# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.77 04-Feb-2023 cheloha

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value. We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.76 17-Nov-2022 deraadt

stack growth from setrlimit was never updated to set UVM_ET_STACK on
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.75 07-Oct-2022 deraadt

Add mimmutable(2) system call which locks the permissions (PROT_*) of
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis


Revision tags: OPENBSD_7_2_BASE
# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.74 28-May-2022 deraadt

oops, wrong value in previous commit


# 1.73 28-May-2022 deraadt

64K of locked memory should be enough for anyone (until we hear a good
reason why)
discussed with many, ok millert


Revision tags: OPENBSD_7_1_BASE
# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.72 18-Mar-2022 visa

Use the refcnt API with struct plimit.

OK bluhm@ dlg@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.71 08-Feb-2021 mpi

Revert the convertion of per-process thread into a SMR_TAILQ.

We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.70 07-Dec-2020 mpi

Convert the per-process thread list into a SMR_TAILQ.

Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.

From and ok claudio@


Revision tags: OPENBSD_6_8_BASE
# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.69 25-Sep-2020 cheloha

setpriority(2): don't treat booleans as scalars

The variable "found" in sys_setpriority() is used as a boolean.
We should set it to 1 to indicate that we found the object we
were looking for instead of incrementing it.

deraadt@ notes that the current code is not buggy, because OpenBSD
cannot support anywhere near 2^32 processes, but agrees that
incrementing the variable signals the wrong thing to the reader.

ok millert@ deraadt@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.68 15-Jul-2019 mpi

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.67 08-Jul-2019 mpi

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.66 24-Jun-2019 visa

Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()
for unlocking.

OK semarie@ mpi@ deraadt@ anton@


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.65 21-Jun-2019 visa

Make resource limit access MP-safe. So far, the copy-on-write sharing
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.

Inspired by code in DragonFly BSD and FreeBSD.

OK mpi@, agreement from jmatthew@ and anton@


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.64 10-Jun-2019 visa

Avoid changing resource limits in rucheck() by introducing a new state
variable that tracks when to send next SIGXCPU. This eases MP work and
prevents accidental alteration of shared resource limit structs.

OK mpi@ semarie@


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.63 02-Jun-2019 visa

Move initialization of limit0 into a dedicated function. This new
function is also a proper place for setting up the plimit pool.

While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.

OK claudio@


# 1.62 01-Jun-2019 mpi

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.61 31-May-2019 mpi

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.60 31-May-2019 visa

Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.

OK deraadt@ guenther@


Revision tags: OPENBSD_6_5_BASE
# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.59 06-Jan-2019 visa

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.58 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.57 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.56 25-Aug-2016 dlg

pool_setipl

ok kettenis@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.55 05-Dec-2015 tedu

remove stale lint annotations


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.54 09-Feb-2015 miod

Stop using USRSTACK as the edge of the stack, but rather use the vmspace
vm_minsaddr or vm_maxsaddr, depending upon the direction the stack goes in.

This should have no effect on the existing behaviourrr.

ok kettenis@ deraadt@


# 1.53 19-Dec-2014 tedu

start retiring the nointr allocator. specify PR_WAITOK as a flag as a
marker for which pools are not interrupt safe. ok dlg


# 1.52 10-Dec-2014 tedu

convert bcopy to memcpy. ok millert


# 1.51 16-Nov-2014 deraadt

Replace a plethora of historical protection options with just
PROT_NONE, PROT_READ, PROT_WRITE, and PROT_EXEC from mman.h.
PROT_MASK is introduced as the one true way of extracting those bits.
Remove UVM_ADV_* wrapper, using the standard names.
ok doug guenther kettenis


Revision tags: OPENBSD_5_6_BASE
# 1.50 30-Mar-2014 guenther

Eliminates struct pcred by moving the real and saved ugids into
struct ucred; struct process then directly links to the ucred

Based on a discussion at c2k10 or so before noting that FreeBSD and
NetBSD did this too.

ok matthew@


Revision tags: OPENBSD_5_5_BASE
# 1.49 24-Jan-2014 guenther

exit1() needs to do a final aggregation of the thread's [us]ticks
and runtime to the process totals. Also, add ktracing of struct
rusage in wait4() and getrusage().

problem pointed out by tedu@
ok deraadt@


# 1.48 21-Jan-2014 tedu

bzero -> memset


# 1.47 20-Jan-2014 guenther

Threads can't be zombies, only processes, so change zombproc to zombprocess,
make it a list of processes, and change P_NOZOMBIE and P_STOPPED from thread
flags to process flags. Add allprocess list for the code that just wants
to see processes.

ok tedu@


# 1.46 25-Oct-2013 guenther

Move the declarations for dogetrusage(), itimerround(), and dowait4()
to sys/*.h headers so that the compat/linux code can use them.
Change dowait4() to not copyout() the status value, but rather leave
that for its caller, as compat/linux has to translate it, with the
side benefit of simplifying the native code.

Originally written months ago as part of the time_t work; long
memory, prodding, and ok from pirofti@


# 1.45 14-Sep-2013 guenther

Eliminate the unused retval argument from dogetrusage()


# 1.44 14-Sep-2013 guenther

Snapshots for all archs have been built, so remove the T32 code


# 1.43 13-Aug-2013 guenther

Switch time_t, ino_t, clock_t, and struct kevent's ident and data
members to 64bit types. Assign new syscall numbers for (almost
all) the syscalls that involve the affected types, including anything
with time_t, timeval, itimerval, timespec, rusage, dirent, stat,
or kevent arguments. Add a d_off member to struct dirent and replace
getdirentries() with getdents(), thus immensely simplifying and
accelerating telldir/seekdir. Build perl with -DBIG_TIME.

Bump the major on every single base library: the compat bits included
here are only good enough to make the transition; the T32 compat
option will be burned as soon as we've reached the new world are
are happy with the snapshots for all architectures.

DANGER: ABI incompatibility. Updating to this kernel requires extra
work or you won't be able to login: install a snapshot instead.

Much assistance in fixing userland issues from deraadt@ and tedu@
and build assistance from todd@ and otto@


Revision tags: OPENBSD_5_4_BASE
# 1.42 03-Jun-2013 guenther

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@


# 1.41 01-Apr-2013 guenther

Make setrlimit() return EINVAL if rlim_cur > rlim_max, per POSIX.
Use limfree() instead of decrementing the reference counter directly.

ok kettenis@


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.40 10-Apr-2012 guenther

Make the KERN_NPROCS and KERN_MAXPROC sysctl()s and the RLIMIT_NPROC rlimit
count processes instead of threads. New sysctl()s KERN_NTHREADS and
KERN_MAXTHREAD count and limit threads. The nprocs and maxproc kernel
variables are replaced by nprocess, maxprocess, nthreads, and maxthread.

ok tedu@ mikeb@


# 1.39 23-Mar-2012 guenther

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@


# 1.38 19-Mar-2012 guenther

Add tracing and dumping of "pointer to struct" syscall arguments for
structs timespec, timeval, sigaction, and rlimit.

ok otto@ jsing@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.37 07-Mar-2011 guenther

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.36 26-Jul-2010 guenther

Correct the links between threads, processes, pgrps, and sessions,
so that the process-level stuff is to/from struct process and not
struct proc. This fixes a bunch of problem cases in rthreads.
Based on earlier work by blambert and myself, but mostly written
at c2k10.

Tested by many: deraadt, sthen, krw, ray, and in snapshots


# 1.35 29-Jun-2010 guenther

Eliminate struct plimit's PL_SHAREMOD flag: it was for COMPAT_IRIX
sproc() support, but we don't have COMPAT_IRIX.
ok krw@ tedu@


Revision tags: OPENBSD_4_7_BASE
# 1.34 04-Jan-2010 guenther

Don't decrement the refcnt on a plimits until after we're done
copying it, so that the process can't sleep in pool_get() and have
the source structure get pool_put() or modified behind its back.

ok deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.33 22-May-2008 thib

Use LIST_FOREACH() instead of handrolling.

From: Pierre Riteau pierre.riteau_att_gmail.com
OK miod@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.32 12-Apr-2007 tedu

move p_limit and p_cred into struct process
leave macros behind for now to keep the commit small
ok art beck miod pedro


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE
# 1.31 28-Nov-2005 jsg

ansi/deregister.
'go for it' deraadt@


Revision tags: OPENBSD_3_8_BASE
# 1.30 29-May-2005 deraadt

sched work by niklas and art backed out; causes panics


# 1.29 25-May-2005 niklas

This patch is mortly art's work and was done *a year* ago. Art wants to thank
everyone for the prompt review and ok of this work ;-) Yeah, that includes me
too, or maybe especially me. I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem. The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors. Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel... A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.


Revision tags: OPENBSD_3_7_BASE
# 1.28 26-Dec-2004 miod

Use list and queue macros where applicable to make the code easier to read;
no change in compiler assembly output.


Revision tags: OPENBSD_3_6_BASE
# 1.27 13-Jun-2004 niklas

debranch SMP, have fun


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.26 11-Dec-2003 millert

Add id_t type as per POSIX and use it for [gs]etpriority(2).
OK henning@ and deraadt@


# 1.25 11-Dec-2003 millert

POSIX says rlim_t should be unsigned so make it u_quad_t. Also add
POSIX-mandated RLIM_SAVED_MAX and RLIM_SAVED_CUR defines. On OpenBSD
these are identical to RLIM_INFINITY as allowed by POSIX. OK deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.24 01-Sep-2003 henning

match syscallargs comments with reality
from Patrick Latifi <patrick.l@hermes.usherb.ca>
ok jason@ tedu@


# 1.23 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.22 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.21 15-Oct-2002 nordin

Match reality by changing (u_int) -> (int) in comments.


Revision tags: OPENBSD_3_2_BASE
# 1.20 02-Oct-2002 nordin

branches: 1.20.2;
Check for negative values. Inspiration from tedu <grendel@zeitbombe.org>.
ok deraadt@ and art@


# 1.19 21-Jul-2002 art

Map stack pages without VM_PROT_EXECUTE. Notice that right now this
doesn't do anything since no pmap implements exec protection yet.


Revision tags: OPENBSD_3_1_BASE
# 1.18 25-Jan-2002 art

branches: 1.18.2;
Convert plimit allocations to pool.


# 1.17 20-Dec-2001 nordin

Make user/system times increase monotonically. ok deraadt@ and millert@


Revision tags: UBC_BASE
# 1.16 10-Nov-2001 art

branches: 1.16.2;
Move maxdmap and maxsmap to kern_resource.c


# 1.15 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.14 27-Jun-2001 art

branches: 1.14.2;
remove old vm


# 1.13 26-May-2001 art

Make it a bit more obvious what dosetrlimit does. (shrink).


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.12 05-May-2000 art

Add limfree prototype to sys/recosurcevar.h.


# 1.11 03-Mar-2000 art

Use LIST_ macros instead of internal field names to walk the allproc list.


Revision tags: SMP_BASE kame_19991208
# 1.10 05-Nov-1999 mickey

branches: 1.10.2;
more stack direction fixes; art@ ok


Revision tags: OPENBSD_2_6_BASE
# 1.9 15-Jul-1999 art

vm_offset_t -> {v,p}addr_t ; vm_size_t -> {v,p}size_t


Revision tags: OPENBSD_2_5_BASE
# 1.8 26-Feb-1999 art

uvm allocation and name changes


Revision tags: OPENBSD_2_1_BASE OPENBSD_2_2_BASE OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.7 24-Nov-1996 millert

Sync with NetBSD. Figure NZERO into priorities and that rlim_cur
and rlim_max are >0.


Revision tags: OPENBSD_2_0_BASE
# 1.6 27-Jul-1996 deraadt

sec can be a long


# 1.5 02-Jul-1996 deraadt

unsigned usec can go negative, should be added in as is; netbsd pr#2585; Juergen.Fluk@lrz.tu-muenchen.de


# 1.4 20-Jun-1996 deraadt

calcru() must calculate using u_quad_t to avoid overflows; netbsd pr#2496, brb@exp.com


# 1.3 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.2 14-Dec-1995 deraadt

from netbsd; limfree()


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision