History log of /netbsd-current/sys/sys/proc.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.373 04-Oct-2023 ad

p->p_stat is actually locked by proc_lock so document it that way and
shuffle some fields around so it's not next to p->p_trace_enabled (that
needs some attention too, in a later change).


# 1.372 11-Jul-2023 riastradh

sys: Rip <sys/resourcevar.h> out of <uvm/uvm_param.h>.

And thus out of <sys/param.h>, which is exceedingly overused and
fragile and delenda est.

Should fix (some) issues with the recent inclusion of machine/lock.h
in various machine/mutex.h files.


# 1.371 01-May-2023 mlelstv

Default PROC_MACHINE_ARCH to machine_arch and use this for magic
symlinks to resolve "@machine_arch".

This keeps behaviour of magic symlinks and 'uname -p' output the same.
Fixes PR 57320.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.370 09-May-2022 wiz

branches: 1.370.4;
fix typo in comment


# 1.369 10-Oct-2021 thorpej

Changes to make EVFILT_PROC MP-safe:

Because the locking protocol around processes is somewhat complex
compared to other events that can be posted on kqueues, introduce
new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK,
rather than just using the generic knote() function. These functions
KASSERT() their locking expectations, and deal with other complexities
for each situation.

knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which
requires allocation of a new knote to attach to the child process. We
don't want to be allocating memory while holding the parent's p_lock.
Furthermore, we also have to attach the tracking note to the child
process, which means we have to acquire the child's p_lock.

So, to handle all this, we introduce some additional synchronization
infrastructure around the 'knote' structure:

- Add the ability to mark a knote as being in a state of flux. Knotes
in this state are guaranteed not to be detached/deleted, thus allowing
a code path drop other locks after putting a knote in this state.

- Code paths that wish to detach/delete a knote must first check if the
knote is in-flux. If so, they must wait for it to quiesce. Because
multiple threads of execution may attempt this concurrently, a mechanism
exists for a single LWP to claim the detach responsibility; all other
threads simply wait for the knote to disappear before they can make
further progress.

- When kqueue_scan() encounters an in-flux knote, it simply treats the
situation just like encountering another thread's queue marker -- wait
for the flux to settle and continue on.

(The "in-flux knote" idea was inspired by FreeBSD, but this works differently
from their implementation, as the two kqueue implementations have diverged
quite a bit.)

knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so:

- Attempt to put the original tracking knote into a state of flux; if this
fails (because the note has a detach pending), we skip all processing
(the original process has lost interest, and we simply won the race).

- Once the note is in-flux, drop the kq and forking process's locks, and
allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach
a new NOTE_TRACK to the child process. Notably, we do NOT go through
kqueue_register() to do this, but rather do all of the work directly
and KASSERT() our assumptions; this allows us to directly control our
interaction with locks. All memory allocations here are performed with
KM_NOSLEEP, in order to prevent holding the original knote in-flux
indefinitely.

- Because the NOTE_TRACK use case adds knotes to kqueues through a
sort of back-door mechanism, we must serialize with the closing of
the destination kqueue's file descriptor, so steal another bit from
the kq_count field to notify other threads that a kqueue is on its
way out to prevent new knotes from being enqueued while the close
path detaches them.

In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also
fixes a long-standing bug whereby a NOTE_CHILD event could be dropped
if the child process exited before the interested process received the
NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT
event, and would clobber the NOTE_CHILD's 'data' field).

Add a bunch of comments to explain what's going on in various critical
sections, and sprinkle additional KASSERT()s to validate assumptions
in several more locations.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.368 05-Dec-2020 thorpej

Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.


# 1.367 23-May-2020 ad

branches: 1.367.2;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.372 11-Jul-2023 riastradh

sys: Rip <sys/resourcevar.h> out of <uvm/uvm_param.h>.

And thus out of <sys/param.h>, which is exceedingly overused and
fragile and delenda est.

Should fix (some) issues with the recent inclusion of machine/lock.h
in various machine/mutex.h files.


# 1.371 01-May-2023 mlelstv

Default PROC_MACHINE_ARCH to machine_arch and use this for magic
symlinks to resolve "@machine_arch".

This keeps behaviour of magic symlinks and 'uname -p' output the same.
Fixes PR 57320.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.370 09-May-2022 wiz

branches: 1.370.4;
fix typo in comment


# 1.369 10-Oct-2021 thorpej

Changes to make EVFILT_PROC MP-safe:

Because the locking protocol around processes is somewhat complex
compared to other events that can be posted on kqueues, introduce
new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK,
rather than just using the generic knote() function. These functions
KASSERT() their locking expectations, and deal with other complexities
for each situation.

knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which
requires allocation of a new knote to attach to the child process. We
don't want to be allocating memory while holding the parent's p_lock.
Furthermore, we also have to attach the tracking note to the child
process, which means we have to acquire the child's p_lock.

So, to handle all this, we introduce some additional synchronization
infrastructure around the 'knote' structure:

- Add the ability to mark a knote as being in a state of flux. Knotes
in this state are guaranteed not to be detached/deleted, thus allowing
a code path drop other locks after putting a knote in this state.

- Code paths that wish to detach/delete a knote must first check if the
knote is in-flux. If so, they must wait for it to quiesce. Because
multiple threads of execution may attempt this concurrently, a mechanism
exists for a single LWP to claim the detach responsibility; all other
threads simply wait for the knote to disappear before they can make
further progress.

- When kqueue_scan() encounters an in-flux knote, it simply treats the
situation just like encountering another thread's queue marker -- wait
for the flux to settle and continue on.

(The "in-flux knote" idea was inspired by FreeBSD, but this works differently
from their implementation, as the two kqueue implementations have diverged
quite a bit.)

knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so:

- Attempt to put the original tracking knote into a state of flux; if this
fails (because the note has a detach pending), we skip all processing
(the original process has lost interest, and we simply won the race).

- Once the note is in-flux, drop the kq and forking process's locks, and
allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach
a new NOTE_TRACK to the child process. Notably, we do NOT go through
kqueue_register() to do this, but rather do all of the work directly
and KASSERT() our assumptions; this allows us to directly control our
interaction with locks. All memory allocations here are performed with
KM_NOSLEEP, in order to prevent holding the original knote in-flux
indefinitely.

- Because the NOTE_TRACK use case adds knotes to kqueues through a
sort of back-door mechanism, we must serialize with the closing of
the destination kqueue's file descriptor, so steal another bit from
the kq_count field to notify other threads that a kqueue is on its
way out to prevent new knotes from being enqueued while the close
path detaches them.

In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also
fixes a long-standing bug whereby a NOTE_CHILD event could be dropped
if the child process exited before the interested process received the
NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT
event, and would clobber the NOTE_CHILD's 'data' field).

Add a bunch of comments to explain what's going on in various critical
sections, and sprinkle additional KASSERT()s to validate assumptions
in several more locations.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.368 05-Dec-2020 thorpej

Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.


# 1.367 23-May-2020 ad

branches: 1.367.2;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.371 01-May-2023 mlelstv

Default PROC_MACHINE_ARCH to machine_arch and use this for magic
symlinks to resolve "@machine_arch".

This keeps behaviour of magic symlinks and 'uname -p' output the same.
Fixes PR 57320.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.370 09-May-2022 wiz

fix typo in comment


# 1.369 10-Oct-2021 thorpej

Changes to make EVFILT_PROC MP-safe:

Because the locking protocol around processes is somewhat complex
compared to other events that can be posted on kqueues, introduce
new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK,
rather than just using the generic knote() function. These functions
KASSERT() their locking expectations, and deal with other complexities
for each situation.

knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which
requires allocation of a new knote to attach to the child process. We
don't want to be allocating memory while holding the parent's p_lock.
Furthermore, we also have to attach the tracking note to the child
process, which means we have to acquire the child's p_lock.

So, to handle all this, we introduce some additional synchronization
infrastructure around the 'knote' structure:

- Add the ability to mark a knote as being in a state of flux. Knotes
in this state are guaranteed not to be detached/deleted, thus allowing
a code path drop other locks after putting a knote in this state.

- Code paths that wish to detach/delete a knote must first check if the
knote is in-flux. If so, they must wait for it to quiesce. Because
multiple threads of execution may attempt this concurrently, a mechanism
exists for a single LWP to claim the detach responsibility; all other
threads simply wait for the knote to disappear before they can make
further progress.

- When kqueue_scan() encounters an in-flux knote, it simply treats the
situation just like encountering another thread's queue marker -- wait
for the flux to settle and continue on.

(The "in-flux knote" idea was inspired by FreeBSD, but this works differently
from their implementation, as the two kqueue implementations have diverged
quite a bit.)

knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so:

- Attempt to put the original tracking knote into a state of flux; if this
fails (because the note has a detach pending), we skip all processing
(the original process has lost interest, and we simply won the race).

- Once the note is in-flux, drop the kq and forking process's locks, and
allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach
a new NOTE_TRACK to the child process. Notably, we do NOT go through
kqueue_register() to do this, but rather do all of the work directly
and KASSERT() our assumptions; this allows us to directly control our
interaction with locks. All memory allocations here are performed with
KM_NOSLEEP, in order to prevent holding the original knote in-flux
indefinitely.

- Because the NOTE_TRACK use case adds knotes to kqueues through a
sort of back-door mechanism, we must serialize with the closing of
the destination kqueue's file descriptor, so steal another bit from
the kq_count field to notify other threads that a kqueue is on its
way out to prevent new knotes from being enqueued while the close
path detaches them.

In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also
fixes a long-standing bug whereby a NOTE_CHILD event could be dropped
if the child process exited before the interested process received the
NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT
event, and would clobber the NOTE_CHILD's 'data' field).

Add a bunch of comments to explain what's going on in various critical
sections, and sprinkle additional KASSERT()s to validate assumptions
in several more locations.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.368 05-Dec-2020 thorpej

Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.


# 1.367 23-May-2020 ad

branches: 1.367.2;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.370 09-May-2022 wiz

fix typo in comment


# 1.369 10-Oct-2021 thorpej

Changes to make EVFILT_PROC MP-safe:

Because the locking protocol around processes is somewhat complex
compared to other events that can be posted on kqueues, introduce
new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK,
rather than just using the generic knote() function. These functions
KASSERT() their locking expectations, and deal with other complexities
for each situation.

knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which
requires allocation of a new knote to attach to the child process. We
don't want to be allocating memory while holding the parent's p_lock.
Furthermore, we also have to attach the tracking note to the child
process, which means we have to acquire the child's p_lock.

So, to handle all this, we introduce some additional synchronization
infrastructure around the 'knote' structure:

- Add the ability to mark a knote as being in a state of flux. Knotes
in this state are guaranteed not to be detached/deleted, thus allowing
a code path drop other locks after putting a knote in this state.

- Code paths that wish to detach/delete a knote must first check if the
knote is in-flux. If so, they must wait for it to quiesce. Because
multiple threads of execution may attempt this concurrently, a mechanism
exists for a single LWP to claim the detach responsibility; all other
threads simply wait for the knote to disappear before they can make
further progress.

- When kqueue_scan() encounters an in-flux knote, it simply treats the
situation just like encountering another thread's queue marker -- wait
for the flux to settle and continue on.

(The "in-flux knote" idea was inspired by FreeBSD, but this works differently
from their implementation, as the two kqueue implementations have diverged
quite a bit.)

knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so:

- Attempt to put the original tracking knote into a state of flux; if this
fails (because the note has a detach pending), we skip all processing
(the original process has lost interest, and we simply won the race).

- Once the note is in-flux, drop the kq and forking process's locks, and
allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach
a new NOTE_TRACK to the child process. Notably, we do NOT go through
kqueue_register() to do this, but rather do all of the work directly
and KASSERT() our assumptions; this allows us to directly control our
interaction with locks. All memory allocations here are performed with
KM_NOSLEEP, in order to prevent holding the original knote in-flux
indefinitely.

- Because the NOTE_TRACK use case adds knotes to kqueues through a
sort of back-door mechanism, we must serialize with the closing of
the destination kqueue's file descriptor, so steal another bit from
the kq_count field to notify other threads that a kqueue is on its
way out to prevent new knotes from being enqueued while the close
path detaches them.

In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also
fixes a long-standing bug whereby a NOTE_CHILD event could be dropped
if the child process exited before the interested process received the
NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT
event, and would clobber the NOTE_CHILD's 'data' field).

Add a bunch of comments to explain what's going on in various critical
sections, and sprinkle additional KASSERT()s to validate assumptions
in several more locations.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.368 05-Dec-2020 thorpej

Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.


# 1.367 23-May-2020 ad

branches: 1.367.2;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.369 10-Oct-2021 thorpej

Changes to make EVFILT_PROC MP-safe:

Because the locking protocol around processes is somewhat complex
compared to other events that can be posted on kqueues, introduce
new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK,
rather than just using the generic knote() function. These functions
KASSERT() their locking expectations, and deal with other complexities
for each situation.

knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which
requires allocation of a new knote to attach to the child process. We
don't want to be allocating memory while holding the parent's p_lock.
Furthermore, we also have to attach the tracking note to the child
process, which means we have to acquire the child's p_lock.

So, to handle all this, we introduce some additional synchronization
infrastructure around the 'knote' structure:

- Add the ability to mark a knote as being in a state of flux. Knotes
in this state are guaranteed not to be detached/deleted, thus allowing
a code path drop other locks after putting a knote in this state.

- Code paths that wish to detach/delete a knote must first check if the
knote is in-flux. If so, they must wait for it to quiesce. Because
multiple threads of execution may attempt this concurrently, a mechanism
exists for a single LWP to claim the detach responsibility; all other
threads simply wait for the knote to disappear before they can make
further progress.

- When kqueue_scan() encounters an in-flux knote, it simply treats the
situation just like encountering another thread's queue marker -- wait
for the flux to settle and continue on.

(The "in-flux knote" idea was inspired by FreeBSD, but this works differently
from their implementation, as the two kqueue implementations have diverged
quite a bit.)

knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so:

- Attempt to put the original tracking knote into a state of flux; if this
fails (because the note has a detach pending), we skip all processing
(the original process has lost interest, and we simply won the race).

- Once the note is in-flux, drop the kq and forking process's locks, and
allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach
a new NOTE_TRACK to the child process. Notably, we do NOT go through
kqueue_register() to do this, but rather do all of the work directly
and KASSERT() our assumptions; this allows us to directly control our
interaction with locks. All memory allocations here are performed with
KM_NOSLEEP, in order to prevent holding the original knote in-flux
indefinitely.

- Because the NOTE_TRACK use case adds knotes to kqueues through a
sort of back-door mechanism, we must serialize with the closing of
the destination kqueue's file descriptor, so steal another bit from
the kq_count field to notify other threads that a kqueue is on its
way out to prevent new knotes from being enqueued while the close
path detaches them.

In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also
fixes a long-standing bug whereby a NOTE_CHILD event could be dropped
if the child process exited before the interested process received the
NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT
event, and would clobber the NOTE_CHILD's 'data' field).

Add a bunch of comments to explain what's going on in various critical
sections, and sprinkle additional KASSERT()s to validate assumptions
in several more locations.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.368 05-Dec-2020 thorpej

Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.


# 1.367 23-May-2020 ad

branches: 1.367.2;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.368 05-Dec-2020 thorpej

Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.


Revision tags: thorpej-futex-base
# 1.367 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.367 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.366 23-May-2020 ad

- Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.365 07-May-2020 kamil

On debugger attach to a prestarted process don't report SIGTRAP

Introduce PSL_TRACEDCHILD that indicates tracking of birth of a process.
A freshly forked process checks whether it is traced and if so, reports
SIGTRAP + TRAP_CHLD event to a debugger as a result of tracking forks-like
events. There is a time window when a debugger can attach to a newly
created process and receive SIGTRAP + TRAP_CHLD instead of SIGSTOP.

Fixes races in t_ptrace_wait* tests when a test hangs or misbehaves,
especially the ones reported in tracer_sysctl_lookup_without_duplicates.


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.364 29-Apr-2020 thorpej

- proc_find() retains traditional semantics of requiring the canonical
PID to look up a proc. Add a separate proc_find_lwpid() to look up a
proc by the ID of any of its LWPs.
- Add proc_find_lwp_acquire_proc(), which enables looking up the LWP
*and* a proc given the ID of any LWP. Returns with the proc::p_lock
held.
- Rewrite lwp_find2() in terms of proc_find_lwp_acquire_proc(), and add
allow the proc to be wildcarded, rather than just curproc or specific
proc.
- lwp_find2() now subsumes the original intent of lwp_getref_lwpid(), but
in a much nicer way, so garbage-collect the remnants of that recently
added mechanism.


Revision tags: bouyer-xenpvh-base2
# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

branches: 1.362.2;
Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.363 24-Apr-2020 thorpej

Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs. The
lead LWP of a process gets the PID as its LID. If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.362 06-Apr-2020 kamil

Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.362 06-Apr-2020 kamil

Reintroduce struct proc::p_oppid

Relying on p_opptr is not safe as there is a race between:
- spawner giving a birth to a child process and being killed
- spawnee accessng p_opptr and reporting TRAP_CHLD

PR kern/54786 by Andreas Gustafsson


# 1.361 05-Apr-2020 christos

There is no "s" lock.


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.360 14-Mar-2020 ad

Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.


Revision tags: ad-namecache-base3
# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

branches: 1.357.2;
Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.359 23-Feb-2020 ad

UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.358 29-Jan-2020 ad

- Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case. Replace scans of p->p_lwps with lookups in the
tree. Find free LIDs for new LWPs in the tree. Replace the hashed sleep
queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
line.

- Add some comments.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.357 12-Oct-2019 kamil

Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-0-RC1 netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.357 12-Oct-2019 kamil

Remove now unused p_oppid from struct proc


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.356 30-Sep-2019 kamil

Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo

Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.


Revision tags: netbsd-9-base
# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.355 15-Jul-2019 pgoyette

Move a comment line get it next to the line it describes, avoiding
intervening unrelated text.

NFCI


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.354 21-Jun-2019 kamil

Eliminate PS_NOTIFYSTOP remnants from the kernel

This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.353 11-Jun-2019 kamil

Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events

posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.


Revision tags: phil-wifi-20190609 isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

branches: 1.348.2;

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base
# 1.352 06-Apr-2019 kamil

Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.


# 1.351 01-Mar-2019 christos

PR/53998: Joel Bertrand: Limit the number of semaphores on a
per-user basis not a per-process. We cannot really keep track on
a per-process basis because a parent process can create the semaphore
and a child can free it taking credit for it. There is also a
similar issue about resource exhaustion if we limited the number
of lwps per process as opposed to per user (which we don't).


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.350 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


Revision tags: pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.349 10-Aug-2018 pgoyette

Allow syscall_establish() to install new syscalls when the existing
entry-point is either sys_nomodule or sys_nosys. Update the
makesyscalls.sh script to create a const array of bits to allow
syscall_disestablish() to properly restore the original entry-point.
Update all the initializers of struct emul to initialize the pointer
to the bit array struct emul.

XXX Regen of all files created by makesyscalls.sh will come soon,
XXX followed by a kernel version bump (since struct emul is being
XXX modified).

This commit should address PR kern/45781 and also removes the need
for the work-around for that PR in file

sys/arch/usermode/modules/syscallemu/syscallemu.c


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.348 09-May-2018 kre

Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17


# 1.347 06-May-2018 kamil

Remove an element from struct emul: e_tracesig

e_tracesig used to be implemented for Darwin compat. Nowadays the Darwin
compatiblity layer is gone and there are no other users.

This functionality isn't used where it shall be used in the existing
codebase.

If we want to emulate debugging interfaces in compat layers we would need
to implement that from scratch anyway. We would need to be bug compatible
with other OSes too.

Proposed on tech-kern@.

Welcome to NetBSD 8.99.16!

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422
# 1.346 19-Apr-2018 christos

s/static inline/static __inline/g for consistency with other include
headers.


# 1.345 16-Apr-2018 kamil

Remove the rnewprocp argument from fork1(9)

It's now unused and it can cause use-after-free scenarios as noted by
<Mateusz Guzik>.

Reference: http://mail-index.netbsd.org/tech-kern/2017/09/08/msg022267.html

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.344 09-Jan-2018 maya

branches: 1.344.2;
remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

branches: 1.340.6;
factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.344 09-Jan-2018 maya

remove struct emul's e_fault.

It used to be used by COMPAT_IRIX for the purpose of overriding
uvm_fault (only implemented in MIPS), now removed.

Ride 8.99.12 version bump.


Revision tags: tls-maxphys-base-20171202
# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: matt-nb8-mediatek-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.343 07-Nov-2017 christos

Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.342 28-Aug-2017 kamil

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.341 01-Jul-2017 khorben

Typo


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.341 01-Jul-2017 khorben

Typo


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.340 30-Mar-2017 christos

factor out getauxv code.


# 1.339 24-Mar-2017 christos

Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.


Revision tags: pgoyette-localcount-20170320
# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

branches: 1.337.2;
Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.338 23-Feb-2017 kamil

Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base
# 1.337 14-Jan-2017 kamil

Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.337 14-Jan-2017 kamil

Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)

Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;

#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>


# 1.336 13-Jan-2017 kamil

Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)

PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>


Revision tags: pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.335 19-Oct-2016 skrll

PR kern/51514: ptrace(2) fails for 32-bit process on 64-bit kernel

Updated from the original patch in the PR by me.


Revision tags: nick-nhusb-base-20161004
# 1.334 29-Sep-2016 christos

Introduce and use PROC_PTRSZ() to handle differing pointer size 64->32
emulation.


# 1.333 23-Sep-2016 skrll

Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions


Revision tags: localcount-20160914
# 1.332 13-Sep-2016 martin

Allow emulations to override the creation of ktrace records for posting
signals. In compat_netbsd32 use this to write the 32bit version of
the records, so a 32bit userland kdump is happy.


Revision tags: pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.331 10-Jun-2016 christos

branches: 1.331.2;
GSoC 2016: Charles Cui: add SEM_NSEMS_MAX


Revision tags: nick-nhusb-base-20160529
# 1.330 27-Apr-2016 christos

We need a flag for WCONTINUED so that we can reset it... Fixes bash issue.


Revision tags: nick-nhusb-base-20160422
# 1.329 04-Apr-2016 christos

no need to pass the coredump flag to exit1() since it is set and known
in one place.


# 1.328 04-Apr-2016 christos

Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.327 01-Dec-2015 pgoyette

Finish the rename from sc_auto --> sc_autoload

(Thanks, brad harder)


# 1.326 30-Nov-2015 pgoyette

Rename sc_auto to sc_autoload at suggestion of christos@


# 1.325 30-Nov-2015 pgoyette

Make the list of syscalls which can trigger a module autoload an
attribute of each emulation, rather than having a single global
list which applies only to the default emulation.

This changes 'struct emul' so

Welcome to 7.99.23 !


# 1.324 26-Nov-2015 martin

We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.


# 1.323 24-Sep-2015 christos

Add proc_find_locked(), which returns the process locked and does the
sysctl access check.


Revision tags: nick-nhusb-base-20150921
# 1.322 19-Jun-2015 martin

Make kill1 public (we'll need it from compat/netbsd32)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406
# 1.321 07-Mar-2015 christos

add dtrace syscall glue:
- adds 2 members to sysent: these are the entry and exit probe ids
they are non-zero only when dtrace is loaded
- add an emul specific probe for dtrace: this is NULL unless the emulation
supports dtrace and is loaded
- adjust the syscall stub call trace_enter/exit if needed for systrace
- add more info to trace_enter and exit needed by systrace


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.320 21-Feb-2014 skrll

branches: 1.320.6;
Remove struct simplelock forward declaration.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8
# 1.319 02-Jan-2013 dsl

branches: 1.319.2;
Only expose the bulk of sys/proc.h and sys/lwp.h if _KERNEL or _KMEMUSER
is defined.
i386 and amd64 build ok.


Revision tags: yamt-pagecache-base7
# 1.318 05-Dec-2012 msaitoh

sys/proc.h refers sizeof(struct pcb), so include <machine/pcb.h>.


Revision tags: yamt-pagecache-base6
# 1.317 22-Jul-2012 rmind

branches: 1.317.2;
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.316 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.315 11-Feb-2012 martin

Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.


# 1.314 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


# 1.313 05-Jan-2012 reinoud

Revert MAP_NOSYSCALLS patch.


# 1.312 20-Dec-2011 reinoud

Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.311 21-Oct-2011 christos

branches: 1.311.2; 1.311.6;
add proc_compare prototype.


# 1.310 02-Sep-2011 christos

Add support for PTRACE_FORK.
- add a field in struct proc to save the forker/forkee pid, and a flag.
- add 3 new ptrace calls: PT_GET_PROCESS_STATE, PT_GET_EVENT_MASK,
PT_SET_EVENT_MASK
Add a PT_STRINGS constant so that we don't hard-code the list of ptrace
subcalls in other programs (kdump).


# 1.309 31-Aug-2011 jmcneill

PR# kern/45312: ptrace: PT_SETREGS can't alter system calls

Add a new PT_SYSCALLEMU request that cancels the current syscall, for
use with PT_SYSCALL.


# 1.308 27-Jul-2011 uebayasi

Forward-declare struct vmspace to reduce dependencies on uvm/uvm_extern.h.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.307 02-May-2011 rmind

Update few comments.


# 1.306 01-May-2011 rmind

- Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.


# 1.305 27-Apr-2011 rmind

G/C M_EMULDATA


# 1.304 18-Apr-2011 rmind

Replace malloc with kmem, and remove M_SUBPROC.


# 1.303 13-Apr-2011 mrg

expose the KSTACK_LOWEST_ADDR and KSTACK_SIZE to _KMEMUSER as well,
like the x86 versions do. for crash(8).


# 1.302 08-Mar-2011 pooka

Nuke all threads belonging to a process calling exec before allowing
the exec handshake to return.

In addition to being The Right Thing To Do, fixes some nasty
conditions for CLOEXEC fd's (or at least does so in theory, I
couldn't create any problems although I tried).


Revision tags: bouyer-quota2-nbase
# 1.301 04-Mar-2011 joerg

Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.300 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


Revision tags: jruoho-x86intr-base
# 1.299 14-Jan-2011 rmind

branches: 1.299.2; 1.299.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.298 07-Jul-2010 chs

many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.


# 1.297 01-Jul-2010 rmind

Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.296 03-Mar-2010 yamt

branches: 1.296.2;
comment


# 1.295 21-Feb-2010 darran

Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.


Revision tags: uebayasi-xip-base matt-premerge-20091211
# 1.294 10-Dec-2009 matt

branches: 1.294.2;
Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds). Should result in no code differences.


# 1.293 04-Nov-2009 rmind

do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden. Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).


Revision tags: jym-xensuspend-nbase
# 1.292 22-Oct-2009 rmind

Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.


# 1.291 02-Oct-2009 elad

Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.290 27-May-2009 yamt

add comments on KSTACK_LOWEST_ADDR/KSTACK_SIZE.


Revision tags: yamt-nfs-mp-base4
# 1.289 14-May-2009 yamt

update a comment.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.288 25-Apr-2009 rmind

- Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.


# 1.287 19-Apr-2009 rmind

- Remove a bunch of unused declarations in proc.h header.
- Move yield() and suspendsched() to sched.h, where they should belong.


# 1.286 16-Apr-2009 rmind

- Manage pid_table with kmem(9).
- Remove M_PROC and unused M_SESSION.


# 1.285 16-Apr-2009 rmind

Avoid few #ifdef KSTACK_CHECK_MAGIC.


# 1.284 28-Mar-2009 rmind

Make inferior() function static, rename to p_inferior(), return bool.


Revision tags: nick-hppapmap-base2 haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base mjf-devfs2-base
# 1.283 19-Nov-2008 ad

branches: 1.283.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2
# 1.282 22-Oct-2008 ad

branches: 1.282.2; 1.282.4;
We may want to patch emul::e_sysent[] so drop the const.


Revision tags: haad-dm-base1
# 1.281 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.280 16-Jun-2008 ad

branches: 1.280.2;
- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


# 1.279 04-Jun-2008 ad

branches: 1.279.2;
Make sure the PAX flags are copied/zeroed correctly.


# 1.278 03-Jun-2008 ad

Don't use proc specificdata. Speeds up mmap() and others.


Revision tags: yamt-pf42-base3
# 1.277 02-Jun-2008 ad

Most contention on proc_lock is from getppid(), so cache the parent's PID.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.276 29-Apr-2008 ad

branches: 1.276.2;
Move override of curlwp into lwp.h.


# 1.275 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.274 25-Apr-2008 ad

branches: 1.274.2;
semexit: do nothing if the process has not used semaphores.


# 1.273 24-Apr-2008 ad

Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.272 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.271 17-Mar-2008 yamt

branches: 1.271.2;
- simplify ASSERT_SLEEPABLE.
- move it from proc.h to systm.h.
- add some more checks.
- make it a little more lkm friendly.


Revision tags: nick-net80211-sync-base hpcarm-cleanup-base
# 1.270 19-Feb-2008 ad

branches: 1.270.2; 1.270.6;
Update field markings that describe which locks protect what.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base mjf-devfs-base matt-armv6-base
# 1.269 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


# 1.268 02-Jan-2008 ad

Merge vmlocking2 to head.


# 1.267 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.266 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.265 26-Dec-2007 ad

Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.


# 1.264 25-Dec-2007 perry

Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h


# 1.263 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.262 04-Dec-2007 ad

branches: 1.262.4;
Use atomics to maintain nprocs.


Revision tags: vmlocking2-base1 bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.261 12-Nov-2007 ad

branches: 1.261.2;
Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.


# 1.260 07-Nov-2007 ad

Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.


Revision tags: jmcneill-base
# 1.259 06-Nov-2007 ad

Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.258 01-Nov-2007 dsl

branches: 1.258.2;
Use one byte of p_pad1[] for p_trace_enabled where xxx_syscall_intern()
can save the result of trace_is_enabled() so that it can be efficiently
determined on every system call without having 2 separate syscall functions.
The death of syscall_fancy() looms.


# 1.257 24-Oct-2007 ad

Make ras_lookup() lockless.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base
# 1.256 12-Oct-2007 ad

branches: 1.256.2;
Merge from vmlocking: fix a deadlock with (threaded) soft interrupts and
process exit.


Revision tags: yamt-x86pmap-base2
# 1.255 29-Sep-2007 dsl

Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.254 07-Sep-2007 rmind

branches: 1.254.2;
Implementation of POSIX message queues.

Reviewed by: <ad>, <tech-kern>


# 1.253 07-Aug-2007 ad

branches: 1.253.2;
- Fix a bug with _lwp_park() where if the computed wakeup time was under
1 microsecond into the future, the thread could enter an untimed sleep.
- Change the signature of _lwp_park() to accept an lwpid_t and second
hint pointer, but do so in a way that remains compatible with older
pthread libraries. This can be used to wake another thread before the
calling thread goes asleep, saving at least one syscall + involuntary
context switch. This turns out to be a fairly large win on the condvar
benchmarks that I have tried.
- Mark some more syscalls MP safe.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.252 09-Jul-2007 ad

branches: 1.252.2; 1.252.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.251 03-Jun-2007 dsl

Split sys__lwp_park() so that the compat/netbsd32 code can copyin and convert
its timeout then call the standard function.


# 1.250 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.249 17-May-2007 yamt

mark lwp_exit() and exit1() __noreturn__.


# 1.248 08-May-2007 dsl

Add the child 'rusage' of an exiting process to its own 'rusage' exactly
once, and prior to passing it to the caller of sys_wait4() and at the same
time as adding it to the parent.
Commands like:
time sh -c 'i=0; while [ $i -lt 1000 ]; do i=$(expr $i + 1); done'
now give same output.


# 1.247 07-May-2007 dsl

Split sys_wait4() so that compat code can fiddle with the returned 'status'
and 'rusage' without having to copy data to/from stackgap buffers.
The old split (find_stopped_child) could be removed.
amd64 seems to run netbsd32, linux and linux32 emulations. sparc64 compiles.


# 1.246 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


# 1.245 30-Apr-2007 rmind

Import of POSIX Asynchronous I/O.
Seems to be quite stable. Some work still left to do.

Please note, that syscalls are not yet MP-safe, because
of the file and vnode subsystems.

Reviewed by: <tech-kern>, <ad>


Revision tags: thorpej-atomic-base
# 1.244 11-Mar-2007 ad

branches: 1.244.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..


# 1.243 09-Mar-2007 ad

branches: 1.243.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


# 1.242 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


# 1.241 27-Feb-2007 yamt

typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.240 21-Feb-2007 thorpej

Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.


# 1.239 19-Feb-2007 cube

Introduce a new member to struct emul, e_startlwp, to be used by
sys__lwp_create. It allows using the said syscall under COMPAT_NETBSD32.

The libpthread regression tests now pass on amd64 and sparc64.


# 1.238 18-Feb-2007 dsl

The pre-kauth 'struct ucread' and 'struct pcred' are now only used in the
(depracted some time ago) 'struct kinfo_proc' returned by sysctl.
Move the definitions to sys/syctl.h and rename in order to ensure all the
users are located.


# 1.237 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.236 16-Feb-2007 ad

branches: 1.236.2;
proc_free() was returning a NULL rusage pointer to wait() when a traced
process was reparented. Change proc_free() to copy the rusage to a buffer
on the stack if required, so it can be passed both to the debugger and
to the real parent process.

Fixes kern/35582 (kernel panics with gdb).


# 1.235 15-Feb-2007 ad

Restore proc::p_userret in a limited way for Linux compat. XXX


# 1.234 11-Feb-2007 yamt

remove a forward decl of sa_emul.


Revision tags: post-newlock2-merge
# 1.233 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.232 22-Nov-2006 elad

branches: 1.232.2;
Make PaX MPROTECT use specificdata(9), freeing up two P_* flags.
While here, make more generic for upcoming PaX features.


# 1.231 23-Oct-2006 skrll

Remove chooselwp - it doesn't exist.


Revision tags: yamt-splraiseipl-base2
# 1.230 11-Oct-2006 thorpej

Don't free specificdata in lwp_exit2(); it's not safe to block there.
Instead, free an LWP's specificdata from lwp_exit() (if it is not the
last LWP) or exit1() (if it is the last LWP). For consistency, free the
proc's specificdata from exit1() as well. Add lwp_finispecific() and
proc_finispecific() functions to make this more convenient.


# 1.229 08-Oct-2006 christos

add {proc,lwp}_initspecific and use them to init proc0 and lwp0.


# 1.228 08-Oct-2006 thorpej

Add specificdata support to procs and lwps, each providing their own
wrappers around the speicificdata subroutines. Also:
- Call the new lwpinit() function from main() after calling procinit().
- Move some pool initialization out of kern_proc.c and into files that
are directly related to the pools in question (kern_lwp.c and kern_ras.c).
- Convert uipc_sem.c to proc_{get,set}specific(), and eliminate the p_ksems
member from struct proc.


# 1.227 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.226 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 rpaulo-netinet-merge-pcb-base
# 1.225 30-Jul-2006 ad

branches: 1.225.4; 1.225.6;
Single-thread updates to the process credential.


# 1.224 21-Jul-2006 yamt

add ASSERT_SLEEPABLE() macro to assert we can sleep.


# 1.223 19-Jul-2006 ad

- Hold a reference to the process credentials in each struct lwp.
- Update the reference on syscall and user trap if p_cred has changed.
- Collect accounting flags in the LWP, and collate on LWP exit.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.222 16-May-2006 elad

Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.


# 1.221 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.220 11-May-2006 yamt

cleanup user.h.
- remove several #include which are not directly related to
this header anymore. tweak *.c accordingly.
- update comments.
- move some !_KERNEL #include to proc.h because it's more appropriate
place these days.
- whitespace.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.219 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.218 29-Mar-2006 cube

Rework the _lwp* and sa_* families of syscalls so some details can be
handled differently depending on the emulation. This paves the way for
COMPAT_NETBSD32 support of our pthread system.


# 1.217 20-Mar-2006 drochner

kill the last use of vm_fault_t, from Havard Eidnes


Revision tags: peter-altq-base yamt-pdpolicy-base2
# 1.216 07-Mar-2006 thorpej

branches: 1.216.2; 1.216.4;
Clean up fallout proc_is_traced_p() change:
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.


# 1.215 05-Mar-2006 christos

Add a proc_is_traced_p() macro and use it, instead of copying the same code
in many places. Idea from thorpej.


Revision tags: yamt-pdpolicy-base
# 1.214 05-Mar-2006 christos

branches: 1.214.2;
implement PT_SYSCALL


# 1.213 01-Mar-2006 yamt

merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.212 16-Feb-2006 perry

Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.


# 1.211 24-Dec-2005 perry

branches: 1.211.2; 1.211.4; 1.211.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.210 24-Dec-2005 yamt

fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.


# 1.209 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 ktrace-lwp-base
# 1.208 26-Nov-2005 simonb

Note that M_SUBPROC is only used on sparc/sparc64.


Revision tags: yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3
# 1.207 01-Nov-2005 yamt

branches: 1.207.2;
make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.206 28-Aug-2005 yamt

branches: 1.206.2;
protect p_nrlwps by sched_lock. no objection on tech-kern@. PR/29652.


# 1.205 19-Aug-2005 rpaulo

Correct typo in comments found by Roland Illig.


# 1.204 05-Aug-2005 junyoung

Move proc0 initialization from main() in init_main.c and proc0_insert() in
kern_proc.c into a new function proc0_init() in kern_proc.c, as suggested
on tech-kern@ days ago.


# 1.203 10-Jul-2005 christos

don't define syscall() here because the archs that don't have syscall_intern
yet, define syscall with different signatures in trap.c


# 1.202 10-Jul-2005 christos

No point in declaring syscall_intern and syscall in a zillion places.


# 1.201 29-May-2005 christos

branches: 1.201.2;
make ltsleep and wakeup* vars volatile.


# 1.200 20-May-2005 fvdl

Add an e_usertrap function pointer to struct emul.


Revision tags: kent-audio2-base
# 1.199 30-Mar-2005 christos

PR/19837: Stephen Ma: signal(SIGCHLD, SIG_IGN) should not create zombies.


Revision tags: yamt-km-base4
# 1.198 26-Mar-2005 fvdl

Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.


Revision tags: yamt-km-base3 netbsd-3-base
# 1.197 26-Feb-2005 perry

branches: 1.197.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.196 03-Feb-2005 perry

de-__P


Revision tags: yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.195 01-Oct-2004 yamt

branches: 1.195.4; 1.195.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.194 17-Sep-2004 enami

Put the type of p_tracep back to void *; it is an implementation detail and
no need to expose to the rest of kernel.


# 1.193 08-Aug-2004 jdolecek

pass the fork flags down to the emulation fork hook, so that emulation
code can use the information for setup


# 1.192 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-base
# 1.191 26-Mar-2004 drochner

branches: 1.191.2;
all ports define __HAVE_SIGINFO now, so remove the CPP conditionals


# 1.190 13-Feb-2004 wiz

Uppercase CPU, plural is CPUs.


# 1.189 22-Jan-2004 matt

Allow cpu_lwp_free to be a macro (for architectures which don't require
cpu_lwp_free to do anything).


# 1.188 11-Jan-2004 jdolecek

g/c process state SDEAD - it's not used anymore after 'reaper' removal


# 1.187 11-Jan-2004 jdolecek

ride 1.6ZH version bump - g/c some unused struct lwp and struct proc
fields (former reaper stuff)


# 1.186 04-Jan-2004 jdolecek

Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread


# 1.185 24-Dec-2003 manu

Move the sigfilter hook to a more adequate location, and rename it to better
fit what it does.

The softsignal feature is used in Darwin to trace processes. When the
traced process gets a signal, this raises an exception. The debugger will
receive the exception message, use ptrace with PT_THUPDATE to pass the
signal to the child or discard it, and then it will send a reply to the
exception message, to resume the child.

With the hook at the beginnng of kpsignal2, we are in the context of the
signal sender, which can be the kill(1) command, for instance. We cannot
afford to sleep until the debugger tells us if the signal should be
delivered or not.

Therefore, the hook to generate the Mach exception must be in the traced
process context. That was we can sleep awaiting for the debugger opinion
about the signal, this is not a problem. The hook is hence located into
issignal, at the place where normally SIGCHILD is sent to the debugger,
whereas the traced process is stopped. If the hook returns 0, we bypass
thoses operations, the Mach exception mecanism will take care of notifying
the debugger (through a Mach exception), and stop the faulting thread.


# 1.184 20-Dec-2003 fvdl

Put back Emmanuel's sigfilter hooks, as decided by Core.


# 1.183 20-Dec-2003 manu

Introduce lwp_emuldata and the associated hooks. No hook is provided for the
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.

Patch reviewed by Christos.


# 1.182 06-Dec-2003 atatat

The missing pieces of PROC_PID_STOPEXIT/P_STOPEXIT, a sysctl tweakable
flag that makes a process stop as it exits.


# 1.181 05-Dec-2003 jdolecek

back the sigfilter emulation hook change off


# 1.180 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.179 03-Dec-2003 manu

Add a sigfilter emulation hook. It is used at the beginning of kpsignal2()
so that a specific emulation has the oportunity to filter out some signals.

if sigfilter returns 0, then no signal is sent by kpsignal2().

There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.

This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)


# 1.178 27-Nov-2003 manu

Make the wakeup optionnal in proc_stop, so that it is possible to stop a
process without waking up its parent.


# 1.177 17-Nov-2003 christos

expose proc_stop. needed by mach/darwin emulation.


# 1.176 12-Nov-2003 dsl

- Count number of zombies and stopped children and requeue them at the top
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.

Welcome to 1.6ZF


# 1.175 04-Nov-2003 dsl

Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
(pad fields left in struct proc to avoid kernel bump)
Somehow this file escaped the earlier commit (in spite of being in the cvs diff
I did beforehand!)


# 1.174 09-Oct-2003 yamt

tweak curproc not to reference curlwp twice.
(function calls might be accompanied by curlwp.)


# 1.173 26-Sep-2003 simonb

Fix "constify sendsig/trapsignal" fallout for non-siginfo'd archs. Test
compiled on most architectures.


# 1.172 25-Sep-2003 christos

constify sendsig/trapsignal [suggested by gimpy]


# 1.171 13-Sep-2003 jdolecek

actually remove p_dupfd from struct proc (oops)


# 1.170 06-Sep-2003 christos

SA_SIGINFO changes. This is 1.5Z


# 1.169 24-Aug-2003 chs

add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.


# 1.168 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.167 08-Jul-2003 itojun

prototype must not carry variable name


# 1.166 29-Jun-2003 fvdl

branches: 1.166.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.165 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.164 03-Jun-2003 christos

pad the flag arguments to 8 hex chars.


# 1.163 22-Mar-2003 jdolecek

for NO_PGID, use ((pid_t)-1) rather than (-(pid_t)1)


# 1.162 19-Mar-2003 dsl

Alternative pid/proc allocater, removes all searches associated with pid
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.


# 1.161 12-Mar-2003 dsl

Add pgid_in_session() for validating TIOCSPGRP requests
(approved by christos)


# 1.160 18-Feb-2003 dsl

KNF kern_prot.c


# 1.159 15-Feb-2003 dsl

Fix support of 15 and 16 character lognames.
Warn if the logname is changed within a session - usually a missing setsid.
(approved by christos)


# 1.158 14-Feb-2003 dsl

Split sys_wait4 so that code isn't duplicated in compat tree.
(approved by christos)


# 1.157 04-Feb-2003 yamt

constify wait channels of ltsleep/wakeup. they are never dereferenced.


# 1.156 01-Feb-2003 thorpej

Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.


# 1.155 24-Jan-2003 thorpej

Add a pointer to p1003.1b semaphore data.


# 1.154 22-Jan-2003 yamt

make KSTACK_CHECK_* compile after sa merge.


# 1.153 18-Jan-2003 thorpej

Merge the nathanw_sa branch.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base nathanw_sa_base
# 1.152 21-Dec-2002 gmcgarry

Re-add yield(). Only used by compat code at the moment.


# 1.151 21-Dec-2002 manu

Comment what e_fault in struct emul does


# 1.150 20-Dec-2002 gmcgarry

Remove yield() until the scheduler supports the sched_yield(2) system
call.


Revision tags: gmcgarry_ctxsw_base gmcgarry_ucred_base
# 1.149 12-Dec-2002 jdolecek

branches: 1.149.2;
replace magic number '500' in pid allocation code with a macro PID_SKIP,
defined in <sys/proc.h> (along PID_MAX, NO_PID)


# 1.148 07-Nov-2002 manu

Added two sysctl-able flags: proc.curproc.stopfork and proc.curproc.stopexec
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.

This feature is designed so that it is esay to attach the process using gdb
before it has done anything.

It works also with sproc, kthread_create, clone...


Revision tags: kqueue-aftermerge
# 1.147 23-Oct-2002 jdolecek

merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe


Revision tags: kqueue-beforemerge kqueue-base
# 1.146 22-Sep-2002 gmcgarry

Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.


# 1.145 21-Sep-2002 manu

- Introduce a e_fault field in struct proc to provide emulation specific
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.

- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports

- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));

- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.


Revision tags: gehenna-devsw-base
# 1.144 28-Aug-2002 gmcgarry

MI kernel support for user-level Restartable Atomic Sequences (RAS).


# 1.143 06-Aug-2002 pooka

Add FORK_CLEANFILES flag to fork1(), which makes the new process start out
with a clean descriptor set (ie. not copied or shared from parent).

for rfork()


# 1.142 25-Jul-2002 jdolecek

Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.


# 1.141 11-Jul-2002 pooka

Add FORK_NOWAIT flag, which sets init as the parent of the forked
process. Useful for FreeBSD rfork() emulation.

ok'd by Christos


# 1.140 04-Jul-2002 thorpej

Add kernel support for having userland provide the signal trampoline:

* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.

Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.


# 1.139 02-Jul-2002 yamt

add KSTACK_CHECK_MAGIC. discussed on tech-kern.


# 1.138 17-Jun-2002 christos

Systrace support.


Revision tags: netbsd-1-6-base
# 1.137 02-Apr-2002 jdolecek

branches: 1.137.2; 1.137.4;
move emulation-specific sysctl hook from struct execsw to struct emul,
where it belongs


Revision tags: eeh-devprop-base newlock-base ifpoll-base
# 1.136 11-Jan-2002 christos

branches: 1.136.4;
Fix a ptrace/execve race that could be used to modify the child process's
image during execve. This is a security issue because one can
do that to setuid programs... From FreeBSD.


# 1.135 08-Dec-2001 thorpej

Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).


Revision tags: thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2
# 1.134 18-Sep-2001 jdolecek

Make the setregs hook emulation-specific, rather than executable
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.133 18-Jun-2001 christos

branches: 1.133.2; 1.133.4;
Add an e_trapsignal member to struct emul, so that emulated processes can
send the appropriate signal depending on the trap type.


# 1.132 16-Jun-2001 manu

Removed obsoletes EMUL_NO_BSD_ASYNCIO_PIPE and EMUL_NO_SIGIO_ON_READ flags.
Async I/O OS specifities should now handled in OS specific code. Linux
has been done, but other emulation should be handled. See case LINUX_F_SETFL
in sys/compat/linux/common/linux_file.c:linux_sys_fcntl() for more details.

The data that has been collected yet:

Net Free Open Linux SunOS AIX OSF1 Darwin
send SIGIO to write end of pipe Y N N N N N Y Y
send SIGIO to read end of pipe Y Y N N N ? Y ?
send SIGIO to write end of socket Y Y Y N N Y Y Y
send SIGIO to read end of socket Y Y Y Y Y ? Y ?


# 1.131 30-May-2001 mrg

use _KERNEL_OPT


# 1.130 19-May-2001 manu

Backed out a previous commit that was incomplete and hence broke several
emulation package build


# 1.129 19-May-2001 manu

Moved e_flags outsied of ifdef __HAVE_MINIMAL_EMUL in struct emul
and removed an ifdef that was taking care of this problem


# 1.128 07-May-2001 manu

Changed EMUL_BSD_ASYNCIO_PIPE to EMUL_NO_BSD_ASYNCIO_PIPE, so that
the native emulation (NetBSD) does not have a flag.


# 1.127 06-May-2001 manu

Added two flags to emulation packages:

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.

EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.


# 1.126 30-Apr-2001 lukem

remove some lint


Revision tags: thorpej_scsipi_beforemerge
# 1.125 23-Apr-2001 simonb

Add a comment for p_comm, from Bill Sommerfeld.


Revision tags: thorpej_scsipi_nbase thorpej_scsipi_base
# 1.124 04-Mar-2001 matt

branches: 1.124.2;
ifndef some more routines that are macros on the vax port.


# 1.123 27-Feb-2001 lukem

revert part of previous and change cpu_wait prototype back to using __P():
void cpu_wait __P((struct proc *));
until there's consensus on the correct way to fix this, ports that
#define cpu_wait should at least be able to compile again.


# 1.122 26-Feb-2001 lukem

convert to ANSI KNF


# 1.121 25-Jan-2001 jdolecek

Make e_errno of struct emul 'const int *' (was 'int *'), since the errno
mapping tables were constified recently.
This fixes compile problem reported by Ken Wellsch on current-users@.


# 1.120 25-Jan-2001 jdolecek

move misplaced comment to where it belongs


# 1.119 22-Dec-2000 jdolecek

struct proc: g/c p_unused


# 1.118 22-Dec-2000 jdolecek

split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.


# 1.117 19-Dec-2000 scw

Change struct emul's "char e_name[8]" field to "const char *e_name"
to allow for emulation names >= 8 characters.


# 1.116 11-Dec-2000 mycroft

Introduce 2 new flags in types.h:
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.


# 1.115 09-Dec-2000 jdolecek

change the type of e_syscall in struct emul to
void (*e_syscall) __P((void))
since it's not uniform between ports


# 1.114 09-Dec-2000 mycroft

Nuke some emul flags.


# 1.113 01-Dec-2000 jdolecek

add three emul flags:
EMUL_HAS_SYS___syscall - has SYS___syscall
EMUL_GETPID_PASS_PPID - pass parent pid in getpid()
EMUL_GETID_PASS_EID - pass also effective id in get[ug]id()


# 1.112 01-Dec-2000 jdolecek

add e_path (emulation path) to struct emul, which replaces emulation-specific
*_emul_path variables

change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation

remove no longer needed header files

add e_flags and e_syscall to struct emul; these are unsed and empty for now


# 1.111 21-Nov-2000 jdolecek

restructure struct emul and execsw, in preparation to make emulations LKMable:
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures


# 1.110 19-Nov-2000 sommerfeld

Back out mistaken commits.


# 1.109 19-Nov-2000 sommerfeld

Extend kinfo_proc2 with CPU id


# 1.108 16-Nov-2000 jdolecek

pass pointer to used exec_package to emulation-specific exec hook -
emulation code may make decisions based on e.g. exec format


# 1.107 13-Nov-2000 jdolecek

change the type of *syscallnames[] array to 'const char * const foo[]'


# 1.106 07-Nov-2000 jdolecek

add void *p_emuldata into struct proc - this can be used to hold per-process
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different

This was discussed on tech-kern.


# 1.105 05-Sep-2000 bouyer

Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.


# 1.104 05-Sep-2000 bouyer

Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.


# 1.103 31-Aug-2000 bouyer

Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.


# 1.102 22-Aug-2000 thorpej

Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.


# 1.101 12-Aug-2000 thorpej

Don't bother with a trampoline to start the pagedaemon and
reaper threads.


# 1.100 12-Aug-2000 sommerfeld

Add P_BIGLOCK process flag, indicating that the processor should hold
the kernel "big lock" when running this process.
(this is largely a placeholder for now; big lock code will be added later).


# 1.99 07-Aug-2000 thorpej

It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.


Revision tags: netbsd-1-5-base
# 1.98 08-Jun-2000 thorpej

branches: 1.98.2;
Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.


# 1.97 31-May-2000 thorpej

Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.


# 1.96 28-May-2000 thorpej

Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.


Revision tags: minoura-xpg4dl-base
# 1.95 27-May-2000 thorpej

branches: 1.95.2;
All users of the old sleep() are now gone; nuke it.


# 1.94 27-May-2000 sommerfeld

Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()


# 1.93 26-May-2000 thorpej

First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.


# 1.92 26-May-2000 simonb

Add some new sysctls to help abolish the dreaded "proc size mismatch"
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:

+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.

With input and suggestions from many people on tech-kern.


# 1.91 26-May-2000 thorpej

Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.


# 1.90 10-Apr-2000 thorpej

Make `whichqs' volatile so that C code can safely loop around it.


# 1.89 28-Mar-2000 simonb

Remove duplicate declaration if uvm_swapin() - it's in <uvm/uvm_extern.h>.
Extern the declaration of initproc.


# 1.88 23-Mar-2000 thorpej

Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.


# 1.87 23-Mar-2000 thorpej

New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.


Revision tags: chs-ubc2-newbase
# 1.86 11-Feb-2000 thorpej

Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.


# 1.85 06-Feb-2000 eeh

Add new P_32 flag for processes running 32-bit emulation.


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.84 28-Sep-1999 bouyer

branches: 1.84.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.


# 1.83 10-Aug-1999 thorpej

Pull in <machine/cpu.h> in the MULTIPROCESSOR case to get curcpu() for
use in the `curproc' declaration. Note that machine-dependent code can
still override `curproc' in the single- and multi-processor case as before,
for its own convencience (the SPARC port does this, for example).


Revision tags: chs-ubc2-base
# 1.82 26-Jul-1999 thorpej

Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.


# 1.81 25-Jul-1999 thorpej

Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.


# 1.80 22-Jul-1999 thorpej

Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.


# 1.79 22-Jul-1999 thorpej

Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.


# 1.78 15-Jul-1999 thorpej

A few things to make the Linux clone(2) emulation work a bit better:
- When the exit signal is specified to be 0, don't just assume they
meant SIGCHLD. In the Linux world, this appears to mean "don't deliver
an exit signal at all".
- Simplify P_EXITSIG(); don't check against initproc here, just change
the exit signal to SIGCHLD if reparenting to initproc.

A very simple clone(2) test program now works, and the MpegTV package
starts, but doesn't run properly yet (I believe there is a separate
bug which keeps it from working properly).


# 1.77 13-May-1999 thorpej

Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).


# 1.76 13-May-1999 thorpej

Allow an alternate exit signal (i.e. not SIGCHLD) to be delivered to the
parent, specified at fork time. Specify a new flag to wait4(2), WALTSIG,
to wait for processes which use an alternate exit signal.

This is required for clone(2).


# 1.75 30-Apr-1999 thorpej

Make the proc structure reference the new cwdinfo structure, and define
a few more sharing flags for fork1().


Revision tags: netbsd-1-4-PATCH002 kame_141_19991130 netbsd-1-4-PATCH001 kame_14_19990705 kame_14_19990628 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.74 25-Mar-1999 sommerfe

branches: 1.74.2; 1.74.4;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.


# 1.73 24-Mar-1999 mrg

completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.


# 1.72 25-Jan-1999 kleink

Adapt the System V behaviour of a child process inheriting its parent's
ucontext link but still reset it on exec().


# 1.71 23-Jan-1999 sommerfe

Tweak to earlier fix to p_estcpu:
- no longer conditionalized
- when traced, charge time to real parent, not debugger
- make it clear for future rototillers that p_estcpu should be moved
to the "copy" region of struct proc.


# 1.70 21-Jan-1999 christos

Add p_ctxlink void * member to keep the struct ucontext uc_link member,
used in svr4 emulation.


Revision tags: kenh-if-detach-base
# 1.69 11-Nov-1998 thorpej

Move fork_kthread() to a new file, kern_kthread.c, and rename it to
kthread_create(). Implement kthread_exit() (causes a thrad to exit).
Set P_NOCLDWAIT on kernel threads, which will cause any of their children
to be reparented to init(8) (which is already prepared to wait out orphaned
processes).


# 1.68 11-Nov-1998 thorpej

Initial version of API for creating kernel threads (likely to change somewhat
in the future):
- New function, fork_kthread(), takes entry point, argument for entry point,
and comment for new proc. May be called by any context, will fork the
thread from proc0 (requires slight changes to cpu_fork()).
- cpu_set_kpc() now takes a third argument, a void *arg to pass to the
thread entry point. Thread entry point now takes void * instead of
struct proc *.
- Create the pagedaemon and reaper kernel threads using fork_kthread().


Revision tags: chs-ubc-base
# 1.67 19-Oct-1998 pk

Allow `curproc' to be defined in <machine/proc.h> to enable a transition
to SMP support.


# 1.66 18-Sep-1998 christos

Add NOCLDWAIT (from FreeBSD)


# 1.65 11-Sep-1998 mycroft

Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.


# 1.64 08-Sep-1998 thorpej

- Add a new proclist, deadproc, which holds dead-but-not-yet-zombie
processes.
- Create a new data structure, the proclist_desc, which contains a
pointer to a proclist, and eventually, a pointer to the lock for that
proclist. Declare a static array of proclist_descs, proclists[],
consisting of allproc, deadproc, and zombproc.


# 1.63 01-Sep-1998 thorpej

Use the pool allocator and the "nointr" pool page allocator for rusage
structures.


# 1.62 31-Aug-1998 thorpej

Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.


# 1.61 02-Aug-1998 thorpej

Use a pool for proc structures.


Revision tags: eeh-paddr_t-base
# 1.60 02-May-1998 christos

fktrace changes.


# 1.59 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.58 14-Feb-1998 thorpej

Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.


# 1.57 10-Feb-1998 mrg

- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.


# 1.56 05-Feb-1998 mrg

initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)


# 1.55 05-Jan-1998 thorpej

Also pass fork1() a struct proc **, in case the caller wants a pointer
to the newly created process.


# 1.54 04-Jan-1998 thorpej

Define flags passed to fork1(). Currently "block parent" and "share vmspace"
are defined.


Revision tags: netbsd-1-3-PATCH003 netbsd-1-3-PATCH003-CANDIDATE2 netbsd-1-3-PATCH003-CANDIDATE1 netbsd-1-3-PATCH003-CANDIDATE0 netbsd-1-3-PATCH002 netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base marc-pcmcia-base
# 1.53 10-Oct-1997 mycroft

GC pageproc and bclnlist.


# 1.52 09-Oct-1997 mycroft

Make wmesg arguments to various functions const.


# 1.51 11-Sep-1997 mycroft

Fix execve(2) and *setregs() interfaces so emulations can set registers in a
more correct way. (See tech-kern.)


Revision tags: thorpej-signal-base marc-pcmcia-bp
# 1.50 06-Jul-1997 fvdl

branches: 1.50.2; 1.50.4;
Add lock count fields to proc structure. Always define NCPU to 1 for now
in lock.h


# 1.49 28-Apr-1997 mycroft

Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.


# 1.48 28-Apr-1997 mycroft

Remove remnants of P_FSTRACE, which is no longer used.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.47 06-Nov-1996 cgd

Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue(). Also, move remrunqueue() prototype from vm/vm_extern.h
to sys/proc.h, so that it's in the same place as the setrunqueue() prototype
and other related prototypes.


# 1.46 02-Oct-1996 ws

Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.


# 1.45 07-Sep-1996 mycroft

Implement poll(2).


Revision tags: netbsd-1-2-PATCH001 netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.44 22-Apr-1996 christos

add prototypes from <sys/cpu.h> to the appropriate places


# 1.43 14-Mar-1996 christos

filedesc.h, proc.h: Rename fdopen() to filedescopen() so that it does not
conflict with the floppy driver.
conf.h: Protect against multiple inclusions. The reason will become apparent
soon.
systm.h: Bring Debugger() prototype into scope.


# 1.42 09-Feb-1996 christos

Filesystem prototype changes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.41 13-Aug-1995 mycroft

Add PHOLD() and PRELE() macros, used to hold a process in core and release it.


# 1.40 22-Apr-1995 christos

- new struct emul for OS emulations.
- deprecated exec_setup_fcn
- deprecated EMUL_???
- added sunos_machdep.c for the m68k ports.


# 1.39 13-Apr-1995 mycroft

EMUL_IBCS2_ELF -> EMUL_SVR4; EMUL_IBCS2_{COFF,XOUT} -> EMUL_IBCS2


# 1.38 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.37 28-Feb-1995 cgd

add an EMUL constant for Linux emulation


# 1.36 08-Jan-1995 cgd

light cleanup, related to spacing...


# 1.35 24-Dec-1994 cgd

various function definitions.


# 1.34 30-Oct-1994 cgd

DTRT with thread id.


# 1.33 05-Sep-1994 mycroft

New iBCS2 code from Scott.


# 1.32 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


# 1.31 15-Aug-1994 mycroft

Add EMUL_IBCS2_COFF, and rename EMUL_IBCS2 to EMUL_IBCS2_ELF.


# 1.30 14-Aug-1994 cgd

add a new p_emul value, clean up slightly.


Revision tags: netbsd-1-0-base
# 1.29 29-Jun-1994 cgd

branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.28 27-Jun-1994 cgd

new standard, minimally intrusive ID format


# 1.27 15-Jun-1994 mycroft

Turn P_NOSWAP and P_PHYSIO into a hold count, as suggested by a comment.


# 1.26 22-May-1994 deraadt

add EMUL_IBCS2


# 1.25 21-May-1994 glass

add ultrix emulation flag


# 1.24 21-May-1994 cgd

update to 4.4-Lite; no serious changes


# 1.23 13-May-1994 cgd

kill 3 bogons, note more to go...


# 1.22 05-May-1994 mycroft

Now setpri() is really toast.


# 1.21 05-May-1994 cgd

lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.


# 1.20 04-May-1994 cgd

Rename a lot of process flags.


# 1.19 29-Apr-1994 cgd

kill syscall name aliases. no user-visible changes


Revision tags: nvm-base wnvm
# 1.18 06-Apr-1994 cgd

branches: 1.18.2;
add SUGID


# 1.17 20-Jan-1994 ws

Make procfs really work for debugging.
Implement not & notepg files in procfs.


# 1.16 08-Jan-1994 mycroft

Move some prototypes to a better location.


# 1.15 08-Jan-1994 cgd

core reorg


# 1.14 04-Jan-1994 cgd

field name change


# 1.13 22-Dec-1993 cgd

add proto for proc_reparent() function from jsp.
he gave us the function, but i'm not sure exactly where the proto
should go...


# 1.12 21-Dec-1993 mycroft

All the world is *not* an i386.


# 1.11 21-Dec-1993 cgd

move EMUL_* definitions to a sane location , and fix them up some


# 1.10 21-Dec-1993 cgd

move things around as appropriate, add 7 more spares (to round to 256)


# 1.9 21-Dec-1993 cgd

delete stupidity, add a few fields


# 1.8 12-Dec-1993 deraadt

add per-process emulation variable
support for OMAGIC/NMAGIC executables
STACKGAP support needed by compatibility functions


Revision tags: magnum-base
# 1.7 15-Sep-1993 cgd

make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.6 27-Jun-1993 andrew

branches: 1.6.4;
ANSIfications - lots of function prototyping.


# 1.5 20-May-1993 cgd

add rcs ids as necessary, and also clean up headers


# 1.4 20-May-1993 cgd

have proc.h, socketvar.h, tty.h include select.h automatically


# 1.3 15-May-1993 cgd

fix the fact that p_wmesg was in the wrong section of the proc struct


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision