Cross Reference: /freebsd-11-stable/sys/kern/kern

History log of /freebsd-11-stable/sys/kern/kern_proc.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 343948	10-Feb-2019	kib	MFC r343724: Do not call PHOLD() while owning the allproc_lock sx.
# 336204	11-Jul-2018	kib	MFC r335935: Add a way for the process to request cleanup of the kernel cache of the process arguments.
# 335820	30-Jun-2018	kib	MFC r335504: fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
# 331727	29-Mar-2018	mjoras	MFC r325621, r325622, r331227 Add EVENTHANDLER_LIST and some users. Also fix a longstanding bug in mtx initialization.
# 331722	29-Mar-2018	eadler	Revert r330897: This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code. Revert with prejudice. This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes. Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes. Requested by: gjb (re)
# 330897	14-Mar-2018	eadler	Partial merge of the SPDX changes These changes are incomplete but are making it difficult to determine what other changes can/should be merged. No objections from: pfg
# 328571	29-Jan-2018	jhb	MFC 327561: Report offset relative to the backing object for kinfo_vmentry structures. For the pathname reported in kinfo_vmentry structures (kve_path), the sysctl handlers walk the object chain to find the bottom-most VM object. This permits a COW mapping of a file with dirty pages to report the pathname of the originally mapped file. Do the same for the object offset (kve_offset) computing a cumulative offset during the same object walk so that the reported offset is relative to the reported pathname. Note that ptrace(PT_VM_ENTRY) already returns a cumulative offset rather than the raw offset of the VM map entry. Note also that this does not affect procstat -v output (even structured output) since that output does not include the kve_offset field. Sponsored by: DARPA / AFRL
# 327547	04-Jan-2018	kib	MFC r327285: Make kern_proc_vmmap_resident() externally accesible, and move the vmmap_skip_res_cnt control check inside it.
# 327404	31-Dec-2017	mjg	MFC r323234,r323305,r323306,r324044: Start annotating global _padalign locks with __exclusive_cache_line While these locks are guarnteed to not share their respective cache lines, their current placement leaves unnecessary holes in lines which preceeded them. For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour cachelines (previously separate by the lock) to be collapsed into 1. The annotation is only effective on architectures which have it implemented in their linker script (currently only amd64). Thus locks are not converted to their not-padaligned variants as to not affect the rest. ============= Annotate global process locks with __exclusive_cache_line ============= Annotate Giant with __exclusive_cache_line ============= Annotate sysctlmemlock with __exclusive_cache_line.
# 326242	27-Nov-2017	delphij	MFC r325755: Be more careful when doing calculation with request from userland.
# 324640	15-Oct-2017	brooks	MFC r320999: Add 32-bit compat for kinfo_proc's ki_tdaddr. This appears to have been an oversight in r213536. Reviewed by: markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11521
# 319778	10-Jun-2017	kib	MFC r319518: Ensure that cached struct thread does not keep spurious td_su reference on an UFS mount point. MFC r319519: Clean possible td_su reference on the struct mount being unmounted as the last step of ffs_unmount(). Approved by: re (gjb)
# 316073	28-Mar-2017	kib	MFC r315281: Use atop() instead of OFF_TO_IDX() for convertion of addresses or addresses offsets, as intended. MFC r315580 (by alc): Simplify the logic for clipping the range returned by the pager to fit within the map entry. Use atop() rather than OFF_TO_IDX() on addresses.
# 315259	14-Mar-2017	hselasky	MFC r313941: Make sure the thread constructor and destructor eventhandlers are called for all threads belonging to a procedure. Currently the first thread in a procedure is kept around as an optimisation step and is never freed. Because the first thread in a procedure is never freed nor allocated, its destructor and constructor callbacks are never called which means per thread structures allocated by dtrace and the Linux emulation layers for example, might be present for threads which don't need these structures. This patch adds a thread construction and destruction call for the first thread in a procedure. Tested: dtrace, linux emulation Reviewed by: kib @ Sponsored by: Mellanox Technologies
# 310120	15-Dec-2016	vangyzen	MFC r309676 Export the whole thread name in kinfo_proc kinfo_proc::ki_tdname is three characters shorter than thread::td_name. Add a ki_moretdname field for these three extra characters. Add the new field to kinfo_proc32, as well. Update all in-tree consumers to read the new field and assemble the full name, except for lldb's HostThreadFreeBSD.cpp, which I will handle separately. Bump __FreeBSD_version. Sponsored by: Dell EMC
# 304843	26-Aug-2016	kib	MFC r303382: Provide the getboottime(9) and getboottimebin(9) KPI. MFC r303387: Prevent parallel tc_windup() calls. Keep boottime in timehands, and adjust it from tc_windup(). MFC notes: The boottime and boottimebin globals are still exported from the kernel dyn symbol table in stable/11, but their declarations are removed from sys/time.h. This preserves KBI but not KPI, while all in-tree consumers are converted to getboottime(). The variables are updated after tc_setclock_mtx is dropped, which gives approximately same unlocked bugs as before. The boottime and boottimebin locals in several sys/kern_tc.c functions were renamed by adding the '_x' suffix to avoid name conficts.
# 302408	07-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11-stable/MAINTAINERS /freebsd-11-stable/cddl /freebsd-11-stable/cddl/contrib/opensolaris /freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print /freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs /freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs /freebsd-11-stable/contrib/amd /freebsd-11-stable/contrib/apr /freebsd-11-stable/contrib/apr-util /freebsd-11-stable/contrib/atf /freebsd-11-stable/contrib/binutils /freebsd-11-stable/contrib/bmake /freebsd-11-stable/contrib/byacc /freebsd-11-stable/contrib/bzip2 /freebsd-11-stable/contrib/com_err /freebsd-11-stable/contrib/compiler-rt /freebsd-11-stable/contrib/dialog /freebsd-11-stable/contrib/dma /freebsd-11-stable/contrib/dtc /freebsd-11-stable/contrib/ee /freebsd-11-stable/contrib/elftoolchain /freebsd-11-stable/contrib/elftoolchain/ar /freebsd-11-stable/contrib/elftoolchain/brandelf /freebsd-11-stable/contrib/elftoolchain/elfdump /freebsd-11-stable/contrib/expat /freebsd-11-stable/contrib/file /freebsd-11-stable/contrib/gcc /freebsd-11-stable/contrib/gcclibs/libgomp /freebsd-11-stable/contrib/gdb /freebsd-11-stable/contrib/gdtoa /freebsd-11-stable/contrib/groff /freebsd-11-stable/contrib/ipfilter /freebsd-11-stable/contrib/ldns /freebsd-11-stable/contrib/ldns-host /freebsd-11-stable/contrib/less /freebsd-11-stable/contrib/libarchive /freebsd-11-stable/contrib/libarchive/cpio /freebsd-11-stable/contrib/libarchive/libarchive /freebsd-11-stable/contrib/libarchive/libarchive_fe /freebsd-11-stable/contrib/libarchive/tar /freebsd-11-stable/contrib/libc++ /freebsd-11-stable/contrib/libc-vis /freebsd-11-stable/contrib/libcxxrt /freebsd-11-stable/contrib/libexecinfo /freebsd-11-stable/contrib/libpcap /freebsd-11-stable/contrib/libstdc++ /freebsd-11-stable/contrib/libucl /freebsd-11-stable/contrib/libxo /freebsd-11-stable/contrib/llvm /freebsd-11-stable/contrib/llvm/projects/libunwind /freebsd-11-stable/contrib/llvm/tools/clang /freebsd-11-stable/contrib/llvm/tools/lldb /freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump /freebsd-11-stable/contrib/llvm/tools/llvm-lto /freebsd-11-stable/contrib/mdocml /freebsd-11-stable/contrib/mtree /freebsd-11-stable/contrib/ncurses /freebsd-11-stable/contrib/netcat /freebsd-11-stable/contrib/ntp /freebsd-11-stable/contrib/nvi /freebsd-11-stable/contrib/one-true-awk /freebsd-11-stable/contrib/openbsm /freebsd-11-stable/contrib/openpam /freebsd-11-stable/contrib/openresolv /freebsd-11-stable/contrib/pf /freebsd-11-stable/contrib/sendmail /freebsd-11-stable/contrib/serf /freebsd-11-stable/contrib/sqlite3 /freebsd-11-stable/contrib/subversion /freebsd-11-stable/contrib/tcpdump /freebsd-11-stable/contrib/tcsh /freebsd-11-stable/contrib/tnftp /freebsd-11-stable/contrib/top /freebsd-11-stable/contrib/top/install-sh /freebsd-11-stable/contrib/tzcode/stdtime /freebsd-11-stable/contrib/tzcode/zic /freebsd-11-stable/contrib/tzdata /freebsd-11-stable/contrib/unbound /freebsd-11-stable/contrib/vis /freebsd-11-stable/contrib/wpa /freebsd-11-stable/contrib/xz /freebsd-11-stable/crypto/heimdal /freebsd-11-stable/crypto/openssh /freebsd-11-stable/crypto/openssl /freebsd-11-stable/gnu/lib /freebsd-11-stable/gnu/usr.bin/binutils /freebsd-11-stable/gnu/usr.bin/cc/cc_tools /freebsd-11-stable/gnu/usr.bin/gdb /freebsd-11-stable/lib/libc/locale/ascii.c /freebsd-11-stable/sys/cddl/contrib/opensolaris /freebsd-11-stable/sys/contrib/dev/acpica /freebsd-11-stable/sys/contrib/ipfilter /freebsd-11-stable/sys/contrib/libfdt /freebsd-11-stable/sys/contrib/octeon-sdk /freebsd-11-stable/sys/contrib/x86emu /freebsd-11-stable/sys/contrib/xz-embedded /freebsd-11-stable/usr.sbin/bhyve/atkbdc.h /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h /freebsd-11-stable/usr.sbin/bhyve/console.c /freebsd-11-stable/usr.sbin/bhyve/console.h /freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h /freebsd-11-stable/usr.sbin/bhyve/rfb.c /freebsd-11-stable/usr.sbin/bhyve/rfb.h /freebsd-11-stable/usr.sbin/bhyve/sockstream.c /freebsd-11-stable/usr.sbin/bhyve/sockstream.h /freebsd-11-stable/usr.sbin/bhyve/usb_emul.c /freebsd-11-stable/usr.sbin/bhyve/usb_emul.h /freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c /freebsd-11-stable/usr.sbin/bhyve/vga.c /freebsd-11-stable/usr.sbin/bhyve/vga.h
# 301456	05-Jun-2016	kib	Get rid of struct proc p_sched and struct thread td_sched pointers. p_sched is unused. The struct td_sched is always co-allocated with the struct thread, except for the thread0. Avoid useless indirection, instead calculate td_sched location using simple pointer arithmetic in td_get_sched(9). For thread0, which is statically allocated, create a structure to emulate layout of the dynamic allocation. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D6711
# 301455	05-Jun-2016	kib	Use ANSI function definition. Sponsored by: The FreeBSD Foundation
# 298173	17-Apr-2016	markj	Use a loop instead of a goto in sysctl_kern_proc_kstack(). MFC after: 3 days
# 298145	17-Apr-2016	kib	The struct thread td_estcpu member is only used by the 4BSD scheduler. Move it to the struct td_sched for 4BSD, removing always present field, otherwise unused for ULE. New scheduler method sched_estcpu() returns the estimation for kinfo_proc consumption. As before, it always returns 0 for ULE. Remove sched_tick() scheduler method, unused both by 4BSD and ULE. Update locking comment for the 4BSD struct td_sched, copying it from the same comment for ULE. Spell MAXPRI as PRI_MAX_TIMESHARE in the 4BSD comment. Based on some notes from, and reviewed by: bde Sponsored by: The FreeBSD Foundation
# 298069	15-Apr-2016	pfg	kern: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.
# 295435	09-Feb-2016	kib	Rename P_KTHREAD struct proc p_flag to P_KPROC. I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL() definition. Suggested by: jhb Sponsored by: The FreeBSD Foundation
# 295391	08-Feb-2016	kib	Remove the assert which outlived its usefulness, and, by default, disable compilation of the code which made it possible to call stop_all_proc() from usermode at all. Move the comment to the preamble of stop_all_proc() and reword it to give overview of the function intent. proc0 has P_HADTHREADS flag set due to kthread_add(), but no P_KTHREAD, which triggered the assert, which does not serve a purpose now. Reported by: Oliver Pinter Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 294472	20-Jan-2016	mjg	session: avoid proctree lock on proc exit when possible We can get away with the common case with only proc lock held. Reviewed by: kib
# 294468	20-Jan-2016	mjg	session: tidy up fixjobc This stops abusing the 'p' pointer for iteration over children processes and gets rid of useless locking around PRS_ZOMBIE check. Suggested by: kib
# 292440	18-Dec-2015	mjg	proc: fix a race which could result in dereference of bad p_pgrp pointer on fork During fork p_starcopy - p_endcopy area of a process is populated with bcopy with only proc lock held. Another forking thread can find such a process and proceed to access p_pgrp included in said area. Fix the problem by moving the field outside. It is being properly assigned later. Reviewed by: kib Diagnosed by: kib Tested by: Fabian Keil <freebsd-listen fabiankeil.de> MFC after: 10 days
# 292384	16-Dec-2015	markj	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week
# 291961	07-Dec-2015	markj	Add helper functions proc_readmem() and proc_writemem(). These helper functions can be used to read in or write a buffer from or to an arbitrary process' address space. Without them, this can only be done using proc_rwmem(), which requires the caller to fill out a uio. This is onerous and results in code duplication; the new functions provide a simpler interface which is sufficient for most existing callers of proc_rwmem(). This change also adds a manual page for proc_rwmem() and the new functions. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D4245
# 290728	12-Nov-2015	jhb	Export various helper variables describing the layout and size of certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773
# 288944	06-Oct-2015	cem	Fix core corruption caused by race in note_procstat_vmmap This fix is spiritually similar to r287442 and was discovered thanks to the KASSERT added in that revision. NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' vm map via vn_fullpath. As vnodes may move during coredump, this is racy. We do not remove the race, only prevent it from causing coredump corruption. - Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption and truncation, even if names change, at the cost of up to PATH_MAX bytes per mapped object. The new sysctl is documented in core.5. - Fix note_procstat_vmmap to self-limit in the second pass. This addresses corruption, at the cost of sometimes producing a truncated result. - Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste) to grok the new zero padding. Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3824
# 288336	28-Sep-2015	avg	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days
# 287864	16-Sep-2015	jhb	When a process group leader exits, all of the processes in the group are sent SIGHUP and SIGCONT if any of the processes are stopped. Currently this behavior is triggered for any type of process stop including ptrace() stops and transient stops for single threading during exit() and execve(). Thus, if a debugger is attached to a process in a group when the leader exits, the entire group can be HUPed. Instead, only send the signals if a process in the group is stopped due to SIGSTOP. PR: 201149 Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3681
# 287645	11-Sep-2015	markj	Add stack_save_td_running(), a function to trace the kernel stack of a running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256
# 285670	18-Jul-2015	kib	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation
# 284215	10-Jun-2015	mjg	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
# 283924	02-Jun-2015	vangyzen	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)
# 282086	27-Apr-2015	trasz	Make setproctitle(3) work in Capsicum capability mode. This makes ctld(8) child processes to indicate initiator address and name in their titles, similar to what iscsid(8) child processes do. PR: 181352 Differential Revision: https://reviews.freebsd.org/D2363 Reviewed by: rwatson@, mjg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
# 280355	22-Mar-2015	ian	The sysctls that return process argv and envv return binary data, so clear the SBUF_INCLUDENUL flag. Pointed out by: tijl@
# 280332	21-Mar-2015	mjg	proc: use MTX_NEW flag in proc_init This allows us to get rid of bzero which was added specifically to make mtx_init on p_mtx reliable. This also fixes a potential problem where mtx_init on other mutexes could trip over on unitialized memory and fire an assertion. Reviewed by: kib
# 279993	14-Mar-2015	ian	Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl strings returned to userland include the nulterm byte. Some uses of sbuf_new_for_sysctl() write binary data rather than strings; clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in those cases. (Note that the sbuf code still automatically adds a nulterm byte in sbuf_finish(), but since it's not included in the length it won't get copied to userland along with the binary data.) Remove explicit adding of a nulterm byte in a couple places now that it gets done automatically by the sbuf drain code. PR: 195668
# 275753	14-Dec-2014	kib	Fix gcc build. Sponsored by: The FreeBSD Foundation MFC after: 13 days
# 275745	13-Dec-2014	kib	Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 275372	01-Dec-2014	kib	Disable recursion for the process spinlock. Tested by: pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month
# 275121	26-Nov-2014	kib	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 273266	18-Oct-2014	adrian	Update the ULE scheduler + thread and kinfo structs to use int for cpuid rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.
# 272566	05-Oct-2014	kib	On error, sbuf_bcat() returns -1. Some callers returned this -1 to the upper layers, which interpret it as errno value, which happens to be ERESTART. The result was spurious restarts of the sysctls in loop, e.g. kern.proc.proc, instead of returning ENOMEM to caller. Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers expecting errno. In collaboration with: pho Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week
# 271074	03-Sep-2014	mjg	Plug a hypothetical use after free in sysctl kern.proc.groups. MFC after: 1 week
# 270999	03-Sep-2014	glebius	Fix dereference after NULL check. CID: 1234607 Sponsored by: Nginx, Inc.
# 270745	28-Aug-2014	mjg	Return real parent pid in kinfo (used by e.g. ps) Add a separate field which exports tracer pid and add a new keyword ("tracer") for ps to display it. This is a follow up to r270444. Reviewed by: kib MFC after: 1 week Relnotes: yes
# 269656	07-Aug-2014	kib	Correct the problems with the ptrace(2) making the debuggee an orphan. One problem is inferior(9) looping due to the process tree becoming a graph instead of tree if the parent is traced by child. Another issue is due to the use of p_oppid to restore the original parent/child relationship, because real parent could already exited and its pid reused (noted by mjg). Add the function proc_realparent(9), which calculates the parent for given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head element of the p_orphan list and than stepping back to its container to find the parent process. If the parent has already exited, the init(8) is returned. Move the P_ORPHAN and the new helper flag from the p_flag* to new p_treeflag field of struct proc, which is protected by proctree lock instead of proc lock, since the orphans relationship is managed under the proctree_lock already. The remaining uses of p_oppid in ptrace(PT_DETACH) and process reapping are replaced by proc_realparent(9). Phabric: D417 Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 269205	28-Jul-2014	kib	Simplify the expression, by removing redundand calculation. Noted by: "O'Connor, Daniel" <Daniel.O'Connor@emc.com> MFC after: 3 days
# 268712	15-Jul-2014	kib	Followup to r268466. - Move the code to calculate resident count into separate function. It reduces the indent level and makes the operation of vmmap_skip_res_cnt tunable more clear. - Optimize the calculation of the resident page count for map entry. Skip directly to the next lowest available index and page among the whole shadow chain. - Restore the use of pmap_incore(9), only to verify that current mapping is indeed superpage. - Note the issue with the invalid pages. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 268711	15-Jul-2014	kib	Change the calculation of the kinfo_vmentry field kve_private_resident to reflect its name. Noted and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 268490	10-Jul-2014	kib	Unconditionally initialize addr to handle the case of changed map timestamp while the map is unlocked. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 6 days
# 268466	09-Jul-2014	kib	Current code in sysctl proc.vmmap, which intent is to calculate the amount of resident pages, in fact calculates the amount of installed pte entries in the region. Resident pages which were not soft-faulted yet are not counted. Calculate the amount of resident pages by looking in the objects chain backing the region. Add a knob to disable the residency calculation at all. For large sparce regions, either previous or updated algorithm runs for too long time, while several introspection tools do not need the (advisory) RSS value at all. PR: kern/188911 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 261780	11-Feb-2014	jhb	Expose OBJT_MGTDEVICE VM objects used for GEM/TTM with drm2 as an explicit object type. Reviewed by: kib MFC after: 1 week
# 258661	26-Nov-2013	kib	Add an kinfo sysctl to retrieve signal trampoline location for the given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 258622	26-Nov-2013	avg	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks
# 258541	25-Nov-2013	attilio	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# 255708	19-Sep-2013	jhb	Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
# 254943	26-Aug-2013	will	Add the ability to display the default FIB number for a process to the ps(1) utility, e.g. "ps -O fib". bin/ps/keyword.c: Add the "fib" keyword and default its column name to "FIB". bin/ps/ps.1: Add "fib" as a supported keyword. sys/compat/freebsd32/freebsd32.h: sys/kern/kern_proc.c: sys/sys/user.h: Add the default fib number for a process (p->p_fibnum) to the user land accessible process data of struct kinfo_proc. Submitted by: Oliver Fromme <olli@fromme.com>, gibbs
# 254350	15-Aug-2013	markj	Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
# 249488	14-Apr-2013	trociny	Similarly to proc_getargv() and proc_getenvv(), export proc_getauxv() to be able to reuse the code. MFC after: 3 weeks
# 249487	14-Apr-2013	trociny	Re-factor the code to provide kern_proc_filedesc_out(), kern_proc_out(), and kern_proc_vmmap_out() functions to output process kinfo structures to sbuf, to make the code reusable. The functions are going to be used in the coredump routine to store procstat info in the core program header notes. Reviewed by: kib MFC after: 3 weeks
# 249277	08-Apr-2013	attilio	Switch some "low-hanging fruit" to acquire read lock on vmobjects rather than write locks. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
# 248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 243528	25-Nov-2012	pjd	Look for zombie process only if we were given process id. Reviewed by: kib MFC after: 2 weeks X-MFC-after-or-with: 243142
# 243142	16-Nov-2012	kib	In pget(9), if PGET_NOTWEXIT flag is not specified, also search the zombie list for the pid. This allows several kern.proc sysctls to report useful information for zombies. Hold the allproc_lock around all searches instead of relocking it. Remove private pfind_locked() from the new nfs client code. Requested and reviewed by: pjd Tested by: pho MFC after: 3 weeks
# 243007	13-Nov-2012	mjg	enterpgrp: get rid of pgrp2 variable and use KASSERT directly on pgfind result. pgrp2 was used only for debugging, but pgrp2 = pgfind(..) was present in compiled code even for kernels without INVARIANTS Approved by: trasz (mentor) MFC after: 1 week
# 241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 239295	15-Aug-2012	obrien	Don't include opt_ddb.h & <ddb/ddb.h> twice.
# 239065	05-Aug-2012	kib	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
# 238527	16-Jul-2012	pgj	- Add support for displaying process stack memory regions. Approved by: rwatson MFC after: 3 days
# 236136	27-May-2012	kib	Fix ki_cow for compat32 binaries. MFC after: 3 days
# 235850	23-May-2012	kib	Calculate the count of per-process cow faults. Export the count to userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week
# 234616	23-Apr-2012	kib	Allow for the process information sysctls to accept a thread id in addition to the process id. It follows the ptrace(2) interface and allows debugging libraries to use thread ids directly, without slow and verbose conversion of thread id into pid. The PGET_NOTID flag is provided to allow a specific sysctl to disallow this behaviour. All current callers of pget(9) have useful semantic to operate on tid and do not need this flag. Reviewed by: jhb, trocini MFC after: 1 week
# 233389	23-Mar-2012	trociny	Add a sysctl to set and retrieve binary osreldate of another process. Suggested by: kib Reviewed by: kib MFC after: 2 weeks
# 232455	03-Mar-2012	trociny	Make kern.proc.umask sysctl readonly. Requested by: src MFC after: 1 week
# 232181	26-Feb-2012	trociny	Add sysctl to retrieve or set umask of another process. Submitted by: Dmitry Banschikov <me ubique spb ru> Discussed with: kib, rwatson Reviewed by: kib MFC after: 2 weeks
# 230550	25-Jan-2012	trociny	Fix CTL flags in the declarations of KERN_PROC_ENV, AUXV and PS_STRINGS sysctls: they are read only. MFC after: 1 week
# 230470	22-Jan-2012	trociny	Change kern.proc.rlimit sysctl to: - retrive only one, specified limit for a process, not the whole array, as it was previously (the sysctl has been added recently and has not been backported to stable yet, so this change is ok); - allow to set a resource limit for another process. Submitted by: Andrey Zonov <andrey at zonov.org> Discussed with: kib Reviewed by: kib MFC after: 2 weeks
# 230145	15-Jan-2012	trociny	Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want to read strings completely to know the actual size. As a side effect it fixes the issue with kern.proc.args and kern.proc.env sysctls, which didn't return the size of available data when calling sysctl(3) with the NULL argument for oldp. Note, in get_ps_strings(), which does actual work for proc_getargv() and proc_getenvv(), we still have a safety limit on the size of data read in case of a corrupted procces stack. Suggested by: kib MFC after: 3 days
# 228666	17-Dec-2011	trociny	Fix style and white spaces. MFC after: 1 week
# 228648	17-Dec-2011	trociny	On start most of sysctl_kern_proc functions use the same pattern: locate a process calling pfind() and do some additional checks like p_candebug(). To reduce this code duplication a new function pget() is introduced and used. As the function may be useful not only in kern_proc.c it is in the kernel name space. Suggested by: kib Reviewed by: kib MFC after: 2 weeks
# 228302	06-Dec-2011	trociny	Really protect kern.proc.ps_strings sysctls with p_candebug(). This was intended to be in r228288. Spotted by: many MFC after: 1 week
# 228288	05-Dec-2011	trociny	Protect kern.proc.auxv and kern.proc.ps_strings sysctls with p_candebug(). Citing jilles: If we are ever going to do ASLR, the AUXV information tells an attacker where the stack, executable and RTLD are located, which defeats much of the point of randomizing the addresses in the first place. Given that the AUXV information seems to be used by debuggers only anyway, I think it would be good to move it to p_candebug() now. The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already under p_candebug(). Suggested by: jilles Discussed with: rwatson MFC after: 1 week
# 228264	04-Dec-2011	trociny	In sysctl_kern_proc_ps_strings() there is no much sense in checking for P_WEXIT and P_SYSTEM flags. Reviewed by: kib
# 228030	27-Nov-2011	trociny	Add sysctl to retrieve ps_strings structure location of another process. Suggested by: kib Reviewed by: kib
# 228029	27-Nov-2011	trociny	In sysctl_kern_proc_auxv the process was released too early: we still need to hold it when checking process sv_flags. MFC after: 2 weeks
# 227955	24-Nov-2011	trociny	Add sysctl to get process resource limits. Reviewed by: kib MFC after: 2 weeks
# 227874	23-Nov-2011	trociny	Fix build without INVARIANTS. Discussed with: kib
# 227833	22-Nov-2011	trociny	Add new sysctls, KERN_PROC_ENV and KERN_PROC_AUXV, to return environment strings and ELF auxiliary vectors from a process stack. Make sysctl_kern_proc_args to read not cached arguments from the process stack. Export proc_getargv() and proc_getenvv() so they can be reused by procfs and linprocfs. Suggested by: kib Reviewed by: kib Discussed with: kib, rwatson, jilles Tested by: pho MFC after: 2 weeks
# 227786	21-Nov-2011	pluknet	Remove no more relevant XXXRW comments since accessing the vmspace is now properly done with the acquired vmspace reference. Pointed out by: kib
# 227784	21-Nov-2011	pluknet	Use the acquired reference to the vmspace instead of direct dereferencing of p->p_vmspace like it is done in sysctl_kern_proc_vmmap().
# 227316	07-Nov-2011	trociny	Add KVME_FLAG_SUPER and use it in sysctl_kern_proc_vmmap for marking entries with superpages. Submitted by: Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net> Reviewed by: alc, rwatson
# 225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 224986	18-Aug-2011	jhb	One of the general principles of the sysctl(3) API is that a user can query the needed size for a sysctl result by passing in a NULL old pointer and a valid oldsize. The kern.proc.args sysctl handler broke this assumption by not calling SYSCTL_OUT() if the old pointer was NULL. Approved by: re (kib) MFC after: 3 days
# 224199	18-Jul-2011	bz	Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN. Provide backward compatibility defines under BURN_BRIDGES. Suggested by: jhb Reviewed by: emaste Sponsored by: Sandvine Incorporated Approved by: re (kib)
# 224188	18-Jul-2011	jhb	- Export each thread's individual resource usage in in struct kinfo_proc's ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the process sysctls. - Correctly account for the current thread's cputime in the thread when doing the runtime fixup in calcru(). - Use TIDs as the key to lookup the previous thread to compute IO stat deltas in IO mode in top when thread display is enabled. Reviewed by: kib Approved by: re (kib)
# 221807	12-May-2011	stas	- Commit work from libprocstat project. These patches add support for runtime file and processes information retrieval from the running kernel via sysctl in the form of new library, libprocstat. The library also supports KVM backend for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have been modified to take advantage of the library (as the bonus point the fstat(1) utility no longer need superuser privileges to operate), and the procstat(1) utility is now able to display information from memory dumps as well. The newly introduced fuser(1) utility also uses this library and able to operate via sysctl and kvm backends. The library is by no means complete (e.g. KVM backend is missing vnode name resolution routines, and there're no manpages for the library itself) so I plan to improve it further. I'm commiting it so it will get wider exposure and review. We won't be able to MFC this work as it relies on changes in HEAD, which was introduced some time ago, that break kernel ABI. OTOH we may be able to merge the library with KVM backend if we really need it there. Discussed with: rwatson
# 219968	24-Mar-2011	jhb	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week
# 219307	05-Mar-2011	trasz	Export login class information via kinfo and make it possible to view it using "ps -o class".
# 219129	01-Mar-2011	rwatson	Add initial support for Capsicum's Capability Mode to the FreeBSD kernel, compiled conditionally on options CAPABILITIES: Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a subject (typically a process) is in capability mode. Add two new system calls, cap_enter(2) and cap_getmode(2), which allow setting and querying (but never clearing) the flag. Export the capability mode flag via process information sysctls. Sponsored by: Google, Inc. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project MFC after: 3 months
# 217819	25-Jan-2011	kib	Allow debugger to specify that children of the traced process should be automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks
# 215304	14-Nov-2010	brucec	Fix some more style(9) issues.
# 215283	14-Nov-2010	brucec	Fix style(9) issues from r215281 and r215282. MFC after: 1 week
# 215281	14-Nov-2010	brucec	Add some descriptions to sys/kern sysctls. PR: kern/148710 Tested by: Chip Camden <sterling at camdensoftware.com> MFC after: 1 week
# 215145	11-Nov-2010	trasz	Fix style. Submitted by: bde
# 215111	11-Nov-2010	trasz	Remove unneeded conditional. Discussed with: kib
# 213536	07-Oct-2010	emaste	Make a thread's address available via the kern proc sysctl, just like the process address. Add "tdaddr" keyword to ps(1) to display this thread address. Distilled from Sandvine's patch set by Mark Johnston.
# 211616	22-Aug-2010	rpaulo	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
# 211514	19-Aug-2010	jhb	There isn't really a need to hold the ktrace mutex just to read the value of p_traceflag that is stored in the kinfo_proc structure. It is still racey even with the lock and the code will read a consistent snapshot of the flag without the lock.
# 208587	27-May-2010	attilio	Add the support for reporting the NOCOREDUMP flag from sysctl_kern_proc_vmmap(). Sponsored by: Sandvine Incorporated Reviewed by: kib, emaste MFC after: 1 week
# 207659	05-May-2010	kib	Fix a mistake in r207603. td_rux.rux_runtime still needs conversion. Reported and tested by: nwhitehorn Pointy hat to: kib MFC after: 6 days
# 207603	04-May-2010	kib	Use td_rux.rux_runtime for ki_runtime instead of redoing calculation. Submitted by: bde MFC after: 1 week
# 207363	29-Apr-2010	kib	Remove caddr_t casts. Requested by: bde MFC after: 10 days
# 207152	24-Apr-2010	kib	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks
# 207016	21-Apr-2010	kib	Fix typo. Submitted by: emaste Pointy hat to: kib (who needs much bigger wardrobe) MFC after: 1 week
# 207008	21-Apr-2010	kib	Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to mostly work on 64bit host. The work is based on an original patch submitted by emaste, obtained from Sandvine's source tree. Reviewed by: jhb MFC after: 1 week
# 204413	27-Feb-2010	kib	For kinfo_proc in kp->ki_siglist, return the set of the signals pending in the process queue when gathering information for the process, and set of signals pending for the thread, when gathering information for the thread. Previously, the sysctl returned a union of the process and some arbitrary thread pending set for the process, and union of the process and the thread pending set for the thread. MFC after: 1 week
# 204410	27-Feb-2010	jilles	Include terminated threads in ps's process cpu time field. MFC after: 2 weeks
# 200995	25-Dec-2009	bz	Remove an unused global. MFC after: 3 days
# 200732	19-Dec-2009	ed	Let access overriding to TTYs depend on the cdev_priv, not the vnode. Basically this commit changes two things, which improves access to TTYs in exceptional conditions. Basically the problem was that when you ran jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if you want to attach to screens quickly, use ssh(1), etc. The fixes: - Cache the cdev_priv of the controlling TTY in struct session. Change devfs_access() to compare against the cdev_priv instead of the vnode. This allows you to bypass UNIX permissions, even across different mounts of devfs. - Extend devfs_prison_check() to unconditionally expose the device node of the controlling TTY, even if normal prison nesting rules normally don't allow this. This actually allows you to interact with this device node. To be honest, I'm not really happy with this solution. We now have to store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp). In an ideal world, we should just get rid of the latter two and only use s_ttyp, but this makes certian pieces of code very impractical (e.g. devfs, kern_exit.c). Reported by: Many people
# 197692	01-Oct-2009	emaste	In fill_kinfo_thread, copy the thread's name into struct kinfo_proc even if it is empty. Otherwise the previous thread's name would remain in the struct and then be reported for this thread. Submitted by: Ryan Stone MFC after: 1 week
# 196730	01-Sep-2009	kib	Reintroduce the r196640, after fixing the problem with my testing. Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week
# 196648	29-Aug-2009	kib	Reverse r196640 and r196644 for now.
# 196640	29-Aug-2009	kib	Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week
# 195853	24-Jul-2009	brooks	Introduce a new sysctl process mib, kern.proc.groups which adds the ability to retrieve the group list of each process. Modify procstat's -s option to query this mib when the kinfo_proc reports that the field has been truncated. If the mib does not exist, fall back to the truncated list. Reviewed by: rwatson Approved by: re (kib) MFC after: 2 weeks
# 195843	24-Jul-2009	brooks	Revert the changes to struct kinfo_proc in r194498. Instead, fill in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags (all bits currently unused) to indicate overflow with the new flag KI_CRF_GRP_OVERFLOW. This fixes procstat -s. Approved by: re (kib)
# 195840	24-Jul-2009	jhb	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
# 194498	19-Jun-2009	brooks	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
# 193255	01-Jun-2009	rwatson	Add a flags field to struct ucred, and export that via kinfo_proc, consuming one of its spare fields. The cr_flags field is currently unused, but will be used for features, including capability mode and pay-as-you-go audit. Discussed with: jhb, sson
# 192895	27-May-2009	jamie	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
# 188764	18-Feb-2009	attilio	- Add a function (fill_kinfo_aggregate()) which aggregates relevant members for a kinfo entry on a process-wide system. - Use the newly introduced function in order to fix cases like KERN_PROC_PROC where aggregating stats are broken because they just consider the first thread in the pool for each process. (Note, additively, that KERN_PROC_PROC is rather inaccurate on thread-wide informations like the 'state' of the process. Such informations should maybe be invalidated and being forceably discarded by the consumers?). - Simplify the logic of sysctl_out_proc() and adjust the fill_kinfo_thread() accordingly. - Remove checks on the FIRST_THREAD_IN_PROC() being NULL but add assertives. This patch should fix aggregate statistics for KERN_PROC_PROC. This is one of the reasons why top doesn't use this option and now it can be use it safely. ps, when launched in order to display just processes, now should report correct cpu utilization percentages and times (as opposed by the old code). Reviewed by: jhb, emaste Sponsored by: Sandvine Incorporated
# 187657	23-Jan-2009	jhb	- Add conditional Giant locking around the vrele() in sysctl_kern_proc_pathname(). - Mark all the kern.proc.* sysctls as MPSAFE. Submitted by: csjp (2)
# 186563	29-Dec-2008	kib	vm_map_lock_read() does not increment map->timestamp, so we should compare map->timestamp with saved timestamp after map read lock is reacquired, not with saved timestamp + 1. The only consequence of the +1 was unconditional lookup of the next map entry, though. Tested by: pho Approved by: des MFC after: 2 weeks
# 185984	12-Dec-2008	kib	Reference the vmspace of the process being inspected by procfs, linprocfs and sysctl kern_proc_vmmap handlers. Reported and tested by: pho Reviewed by: rwatson, des MFC after: 1 week
# 185764	08-Dec-2008	kib	Do drop vm map lock earlier in the sysctl_kern_proc_vmmap(), to avoid locking a vnode while having vm map locked. Reported and tested by: pho MFC after: 1 week
# 185647	05-Dec-2008	kib	Several threads in a process may do vfork() simultaneously. Then, all parent threads sleep on the parent' struct proc until corresponding child releases the vmspace. Each sleep is interlocked with proc mutex of the child, that triggers assertion in the sleepq_add(). The assertion requires that at any time, all simultaneous sleepers for the channel use the same interlock. Silent the assertion by using conditional variable allocated in the child. Broadcast the variable event on exec() and exit(). Since struct proc * sleep wait channel is overloaded for several unrelated events, I was unable to remove wakeups from the places where cv_broadcast() is added, except exec(). Reported and tested by: ganbold Suggested and reviewed by: jhb MFC after: 2 week
# 185548	02-Dec-2008	peter	Merge user/peter/kinfo branch as of r185547 into head. This changes struct kinfo_filedesc and kinfo_vmentry such that they are same on both 32 and 64 bit platforms like i386/amd64 and won't require sysctl wrapping. Two new OIDs are assigned. The old ones are available under COMPAT_FREEBSD7 - but it isn't that simple. The superceded interface was never actually released on 7.x. The other main change is to pack the data passed to userland via the sysctl. kf_structsize and kve_structsize are reduced for the copyout. If you have a process with 100,000+ sockets open, the unpacked records require a 132MB+ copyout. With packing, it is "only" ~35MB. (Still seriously unpleasant, but not quite as devastating). A similar problem exists for the vmentry structure - have lots and lots of shared libraries and small mmaps and its copyout gets expensive too. My immediate problem is valgrind. It traditionally achieves this functionality by parsing procfs output, in a packed format. Secondly, when tracing 32 bit binaries on amd64 under valgrind, it uses a cross compiled 32 bit binary which ran directly into the differing data structures in 32 vs 64 bit mode. (valgrind uses this to track file descriptor operations and this therefore affected every single 32 bit binary) I've added two utility functions to libutil to unpack the structures into a fixed record length and to make it a little more convenient to use.
# 185029	17-Nov-2008	pjd	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
# 184652	04-Nov-2008	jhb	Remove unnecessary locking around vn_fullpath(). The vnode lock for the vnode in question does not need to be held. All the data structures used during the name lookup are protected by the global name cache lock. Instead, the caller merely needs to ensure a reference is held on the vnode (such as vhold()) to keep it from being freed. In the case of procfs' <pid>/file entry, grab the process lock while we gain a new reference (via vhold()) on p_textvp to fully close races with execve(2). For the kern.proc.vmmap sysctl handler, use a shared vnode lock around the call to VOP_GETATTR() rather than an exclusive lock. MFC after: 1 month
# 184492	31-Oct-2008	peter	Add three extra to the kinfo_proc_vmmap data. kve_offset - the offset within an object that a mapping refers to. fileid and fsid are inode/dev for vnodes. (Linux procfs has these and valgrind is really unhappy without them.) I believe I didn't change the size of the struct.
# 184205	23-Oct-2008	des	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
# 183076	16-Sep-2008	ed	Fix minor TTY API inconsistency. Unlike tty_rel_gone() and tty_rel_sess(), the tty_rel_pgrp() routine does not unlock the TTY. I once had the idea to make the code call tty_rel_pgrp() and tty_rel_sess(), picking up the TTY lock once. This turned out a little harder than I expected, so this is how it works now. It's a lot easier if we just let tty_rel_pgrp() unlock the TTY, because the other routines do this anyway.
# 182750	04-Sep-2008	kevlo	If the process id specified is invalid, the system call returns ESRCH
# 181905	20-Aug-2008	ed	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
# 180799	25-Jul-2008	kib	Call pargs_drop() unconditionally in do_execve(), the function correctly handles the NULL argument. Make pargs_free() static. MFC after: 1 week
# 179276	24-May-2008	jb	Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.
# 177368	19-Mar-2008	jeff	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
# 177091	12-Mar-2008	jeff	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 175219	10-Jan-2008	rwatson	Don't zero td_runtime when billing thread CPU usage to the process; maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
# 175202	09-Jan-2008	attilio	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 174947	27-Dec-2007	rwatson	Return ESRCH when a kernel stack is queried on a process in execve() -- p_candebug() will return EAGAIN which, if the other process never leaves execve(), will result in the sysctl spinning and never returning to userspace. Processes should always eventually leave execve(), but spinning in kernel while we wait is bad for countless reasons, and particularly harmful if execve() itself is deadlocked. Possibly we should return another error, or return a marker indicating the thread is in execve() so it can be reported that way in userspace. Reported by: kris
# 174481	09-Dec-2007	rwatson	Check for P_WEXIT before PHOLD() on a process in kstack and vm query sysctls, as PHOLD() asserts !P_WEXIT. Reported by: Michael Plass <mfp49_freebsd at plass-family dot net>
# 174197	02-Dec-2007	rwatson	Add another new sysctl in support of the forthcoming procstat(1) to support its -k argument: kern.proc.kstack - dump the kernel stack of a process, if debugging is permitted. This sysctl is present if either "options DDB" or "options STACK" is compiled into the kernel. Having support for tracing the kernel stacks of processes from user space makes it much easier to debug (or understand) specific wmesg's while avoiding the need to enter DDB in order to determine the path by which a process came to be blocked on a particular wait channel or lock.
# 174167	02-Dec-2007	rwatson	Add two new sysctls in support of the forthcoming procstat(1) to support its -f and -v arguments: kern.proc.filedesc - dump file descriptor information for a process, if debugging is permitted, including socket addresses, open flags, file offsets, file paths, etc. kern.proc.vmmap - dump virtual memory mapping information for a process, if debugging is permitted, including layout and information on underlying objects, such as the type of object and path. These provide a superset of the information historically available through the now-deprecated procfs(4), and are intended to be exported in an ABI-robust form.
# 173781	20-Nov-2007	rwatson	Test that p_textvp is non-NULL be dereferencing, as no executable vnode is set for kernel processes. Reported by: Skip Ford <skip at menantico dot com> MFC after: 3 days
# 173629	15-Nov-2007	rrs	Adds an event handler for: - process_ctor,dtor, init and fini - thread_ctor,dtor, init and fini This allows the ability to add on additional things during construction/destruction of threads and processes. Reviewed by: rwatson
# 173361	05-Nov-2007	kib	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 172264	21-Sep-2007	jeff	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re
# 172207	17-Sep-2007	jeff	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
# 170472	09-Jun-2007	attilio	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)
# 170307	04-Jun-2007	jeff	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 170174	31-May-2007	jeff	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
# 167827	23-Mar-2007	emaste	Stop setting ki_ocomm (thread name) to the proc name by default, as nothing in the base system relies on this any longer.
# 164936	06-Dec-2006	julian	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 163709	26-Oct-2006	jb	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 162873	30-Sep-2006	pjd	Remove duplicated $FreeBSD$.
# 162706	27-Sep-2006	mbr	Move Giant up even further since P_CONTROLT isn't really fully locked yet (p_flag is, but P_CONTROLT isn't really). Submitted by: jhb
# 162581	23-Sep-2006	mbr	Protect enterpgrp() against another tty/proc race case until the tty locking work has been fixed. MFC after: 1 week
# 162452	19-Sep-2006	mbr	Fix races between tty.c and sessrele() / doenterpgrp() / leavepgrp(). The tty code is still under giant lock, but the session/pgrp release code just used proctree_locks. This explains why moving the proctree_lock in sys/kern/tty.c rev. 1.258 did fix the panics in our SMP systems. This should also fix some race panics with revoked ttys. Reviewed by: jhb MFC after: 1 week
# 155534	11-Feb-2006	phk	CPU time accounting speedup (step 2) Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
# 155444	07-Feb-2006	phk	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
# 154535	18-Jan-2006	julian	Return the thread name in the kinfo_proc structure. Also correct the comment describing what the value is.
# 154490	17-Jan-2006	jmallett	Since p_cansee will end up dereferencing p_ucred, don't check for p_ucred equal to NULL several times later. p_ucred "should probably not" be NULL if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact that we don't see frequent crashes here. Remove the checks after p_cansee and add a KASSERT right before it. Found by: Coverity Prevent (tm) Also trim one nearby trailing space.
# 153835	29-Dec-2005	davidxu	Add code to report zombie state. PR: threads/91044 MFC after: 3 days
# 152376	13-Nov-2005	rwatson	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
# 152185	08-Nov-2005	davidxu	Add support for queueing SIGCHLD same as other UNIX systems did. For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)
# 151630	24-Oct-2005	jhb	Document in #ifdef notnow code the actions that proc_fini would need to take if struct procs were actually freed.
# 150843	02-Oct-2005	truckman	Always wire the sysctl output buffer in sysctl_kern_proc() before calling sysctl_out_proc(). -- fix from jhb Move the code in fill_kinfo_thread() that gathers data from struct proc into the new function fill_kinfo_proc_only(). Change all callers of fill_kinfo_thread() to call both fill_kinfo_proc_only() and fill_kinfo() thread. When gathering data from a multi-threaded process, fill_kinfo_proc_only() only needs to be called once. Grab sched_lock before accessing the process thread list or calling fill_kinfo_thread(). PR: kern/84684 MFC after: 3 days
# 150630	27-Sep-2005	jhb	Use the refcount API to implement reference counts on process argument structures rather than using a global mutex to protect the reference counts. Tested on: i386, alpha, sparc64
# 145216	18-Apr-2005	das	Add a sysctl that returns the full path of a process' text file. This information is needed by things like `gdb -p' and Sun's javac, and previously it could only be obtained via procfs
# 144637	04-Apr-2005	jhb	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
# 143870	20-Mar-2005	pjd	Add ki_jid field to the kinfo_proc structure and store jail ID there. Reviewed by: gad MFC after: 3 days
# 143740	17-Mar-2005	phk	In stange circumstances we may end up being the last reference to a session in tprintf(). SESSRELE() needs to properly dispose of the sessions mutex. Add sessrele() which does the proper cleanup and have SESSRELE() call it. Use SESSRELE also in pgdelete(). Found by: Coverity (ID:526)
# 143467	12-Mar-2005	pjd	Function jailed() looks into ucred strcture, so be sure ucred is not NULL. Reviewed by: rwatson MFC after: 1 week
# 143466	12-Mar-2005	pjd	Clean up a bit. Reviewed by: rwatson MFC after: 1 week
# 141625	10-Feb-2005	phk	Make a bunch of SYSCTL_NODEs static.
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 138128	27-Nov-2004	das	Axe a.out core dump support. Neither older gdb binaries nor current bfd sources understand the present format.
# 137946	20-Nov-2004	das	Remove local definitions of RANGEOF() and use __rangeof() instead. Also remove a few bogus casts.
# 137909	20-Nov-2004	das	Malloc p_stats instead of putting it in the U area. We should consider simply embedding it in struct proc. Reviewed by: arch@
# 136344	10-Oct-2004	julian	Remove duplicate line.
# 136152	05-Oct-2004	jhb	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
# 135470	19-Sep-2004	das	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.
# 134791	05-Sep-2004	julian	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
# 133722	14-Aug-2004	rwatson	Cause pfind() not to return processes in the PRS_NEW state. As a result, threads consuming the result of pfind() will not need to check for a NULL credential pointer or other signs of an incompletely created process. However, this also means that pfind() cannot be used to test for the existence or find such a process. Annotate pfind() to indicate that this is the case. A review of curent consumers seems to indicate that this is not a problem for any of them. This closes a number of race conditions that could result in NULL pointer dereferences and related failure modes. Other related races continue to exist, especially during iteration of the allproc list without due caution. Discussed with: tjr, green
# 133402	09-Aug-2004	julian	Remove typos on KASSERT messages.
# 132987	01-Aug-2004	green	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.
# 132855	29-Jul-2004	pjd	Fill some informations about zombie processes as well. Before this change every zombie process were reported as an owner of PID 0 in ps(1) output. Reviewed by: julian
# 130826	20-Jun-2004	gad	Fill in the values for the ki_tid and ki_numthreads which have been added to kproc_info. PR: bin/65803 (a tiny part...) Submitted by: Cyrille Lefevre
# 130759	20-Jun-2004	gad	Add a call to calcru() to update the kproc_info fields of ki_rusage.ru_utime and ki_rusage.ru_stime. This greatly improves the accuracy of those fields. Suggested by: bde
# 130729	19-Jun-2004	gad	This is just a forced commit to note that the previous update was from: PR: bin/65803 (a very tiny piece of the PR)
# 130727	19-Jun-2004	gad	Fill in the some new fields 'struct kinfo_proc', namely ki_childstime, ki_childutime, and ki_emul. Also uses the timevaladd() routine to correct the calculation of ki_childtime. That will correct the value returned when ki_childtime.tv_usec > 1,000,000. This also implements a new KERN_PROC_GID option for kvm_getprocs(). (there will be a similar update to lib/libkvm/kvm_proc.c) Submitted by: Cyrille Lefevre
# 130640	17-Jun-2004	phk	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
# 130551	15-Jun-2004	julian	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
# 130261	09-Jun-2004	phk	Reference count struct tty. Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct tty with a reference count of one. when ttyrel sees the count go to zero, struct tty is freed. Hold references for open ttys and for ttys which are controlling terminal for sessions. Until drivers start using ttyrel(), this commit will make no difference.
# 130260	09-Jun-2004	phk	Fix a race in destruction of sessions.
# 129599	22-May-2004	gad	Implement the new KERN_PROC_RGID option, and also implement the KERN_PROC_SESSION option which had been previously defined but never implemented. PR: bin/65803 (a very tiny piece of the PR)` Submitted by: Cyrille Lefevre
# 127911	05-Apr-2004	imp	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 127695	31-Mar-2004	pjd	Remove ps_argsopen check. It is was bogus in the past and was corrected not quite well by me - if kern.ps_argsopen was set to 0, users weren't permitted to see arguments of even own processes. But kern.ps_argsopen is going away, so just remove this check and leave security checks for p_cansee() function.
# 127123	17-Mar-2004	pjd	Fix information leakage. Without this fix it is possible to cheat policies like: - sysctl security.bsd.see_other_[gu]ids=0, - mac_seeotheruids(4), - jail(2) and get full processes list with their arguments. This problem exists from revision 1.62 of kern_proc.c when it was introduced. Reviewed by: nectar, rwatson.
# 126253	25-Feb-2004	truckman	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
# 126125	22-Feb-2004	deischen	Add sysctls to allow showing threads for pgrp, tty, uid, ruid, and pid.
# 121127	16-Oct-2003	jeff	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.
# 121104	15-Oct-2003	peter	The KERN_PROC_PROC sysctl took 4 args in 5.0-REL and 5.1-REL. We need to accept this for a bit longer. Requiring the new order of 3 args only was not very helpful.
# 120830	05-Oct-2003	tjr	Remove support for the unused 4th component of the KERN_PROC_PROC sysctl.
# 120233	19-Sep-2003	tjr	Allow the KERN_PROC_PROC sysctl to be used without the useless 4th name component, for consistency with KERN_PROC_ALL. Support for the 4-argument form will be removed some time before 5.2-R.
# 118488	05-Aug-2003	davidxu	kse.h is not needed for these files.
# 117703	17-Jul-2003	robert	Correct six return statements which returned zero instead of an appropriate error number after a failure condition. In particular, three of the changed statements return ESRCH for a failed pfind(), and in also three places a non-zero return from p_cansee() will be passed back, Also noticed by: rwatson
# 117464	12-Jul-2003	robert	Make the system call vector name of a process accessible to user land applications by introducing the KERN_PROC_SV_NAME sysctl node, which is searchable by PID.
# 116498	17-Jun-2003	scottl	Drop the proc lock around SYSCTL_OUT in the no-threads case. Submitted by: truckman
# 116328	14-Jun-2003	alc	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.
# 116262	12-Jun-2003	scottl	Add support to sysctl_kern_proc to return all threads in a proc, not just the first one. The old behaviour can be switched by specifying KERN_PROC_PROC. Submitted by: julian, tweaks and added functionality by myself
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 114983	13-May-2003	jhb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
# 114461	01-May-2003	jhb	Initialize and destroy the struct proc mutex in the proc zone's init and fini routines instead of in fork() and wait(). This has the nice side benefit that the proc lock of any process on the allproc list is always valid and sched_lock doesn't have to be used to test against PRS_NEW anymore.
# 114434	01-May-2003	des	Instead of recording the Unix time in a process when it starts, record the uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.
# 113993	24-Apr-2003	tjr	Include altkstack pages in the RSS regardless of whether the process is swapped out. Pointed out by jhb.
# 113966	24-Apr-2003	des	It seems that 1 was not a magic value as I thought, but a coincidence. Instead of applying the adjustment to processes with a start time of 1, apply it to all processes with a start time of less than 3600. None of this would be necessary if the start times were recorded in ticks instead of seconds and microseconds.
# 113965	24-Apr-2003	tjr	Do a better job of calculating the RSS for swapped-out processes: don't include the kernel stacks of swapped-out threads in the page count, but do include the alternate kernel stack. jhb provided some helpful comments on this. PR: 49102
# 113954	24-Apr-2003	des	When filling out a kinfo_proc structure, if we come across a process whose p_stats->p_start has the magic value 1, replace it with boottime. Some users were apparently confused by the fact that ps(1) reported a start time in early 1970 for system processes.
# 113683	18-Apr-2003	jhb	- Add a static function pgadjustjobc() to adjust the job control count for a process group. - Call pgadjustjobc() twice in fixjobc() to avoid code duplication and improve readability. - Use the proc lock to protect P_SHOULDSTOP() instead of sched_lock. - Check to see if a process is PRS_NEW with sched_lock before trying to lock its proc lock since the lock may not be constructed yet.
# 113339	10-Apr-2003	julian	Move the _oncpu entry from the KSE to the thread. The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.
# 112888	31-Mar-2003	jeff	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
# 112198	13-Mar-2003	jhb	- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)
# 112157	12-Mar-2003	jhb	- Various little style fixes. - If SYSCTL_OUT() fails in sysctl_kern_proc_args(), return the error instead of ignoring it if we have new arguments for the process. - If the new arguments for a process are too long, return ENOMEM instead of returning success but not doing the actual copy. Submitted by: bde
# 112152	12-Mar-2003	jhb	- Avoid dropping the proc lock around a simple permissions check and just hold hold it across the check to avoid extra lock operations in the common case. - Copy in the new args to a temporary pargs structure before we drop the reference to the old one. Thus, if the copyin() fails, the process arguments are unchanged rather than being deleted. Also, p_args is no longer NULL during the sysctl operation.
# 111585	27-Feb-2003	julian	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 108660	04-Jan-2003	hsu	Remove unnecessary lock assertion.
# 108470	30-Dec-2002	schweikh	Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/ Add FreeBSD Id tag where missing.
# 107137	21-Nov-2002	jeff	- Add the new sched_pctcpu() function to the sched_* api. - Provide a routine in sched_4bsd to add this functionality. - Use sched_pctcpu() in kern_proc, which is the one place outside of sched_4bsd where the old pctcpu value was accessed directly. Approved by: re
# 107126	20-Nov-2002	jeff	- Implement a mechanism for allowing schedulers to place scheduler dependant data in the scheduler independant structures (proc, ksegrp, kse, thread). - Implement unused stubs for this mechanism in sched_4bsd. Approved by: re Reviewed by: luigi, trb Tested on: x86, alpha
# 105854	24-Oct-2002	julian	Move thread related code from kern_proc.c to kern_thread.c. Add code to free KSEs and KSEGRPs on exit. Sort KSE prototypes in proc.h. Add the missing kse_exit() syscall. ksetest now does not leak KSEs and KSEGRPS. Submitted by: (parts) davidxu
# 105674	22-Oct-2002	davidxu	detect idle kse correctly.
# 105559	20-Oct-2002	julian	Add an actual implementation of kse_wakeup() Submitted by: Davidxu
# 105354	17-Oct-2002	robert	Use strlcpy() instead of strncpy() to copy NUL terminated strings for safety and consistency.
# 105141	14-Oct-2002	jhb	- Add a new global mutex 'ppeers_lock' to protect the p_peers list of processes forked with RFTHREAD. - Use a goto to a label for common code when exiting from fork1() in case of an error. - Move the RFTHREAD linkage setup code later in fork since the ppeers_lock cannot be locked while holding a proc lock. Handle the race of a task leader exiting and killing its peers while a peer is forking a new child. In that case, go ahead and let the peer process proceed normally as the parent is about to kill it. However, the task leader may have already gone to sleep to wait for the peers to die, so the new child process may not receive a SIGKILL from the task leader. Rather than try to destruct the new child process, just go ahead and send it a SIGKILL directly and add it to the p_peers list. This ensures that the task leader will wait until both the peer process doing the fork() and the new child process have received their KILL signals and exited. Discussed with: truckman (earlier versions)
# 104695	09-Oct-2002	julian	Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.
# 104387	02-Oct-2002	jhb	Rename the mutex thread and process states to use a more generic 'LOCK' name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will be used internally by the turnstile code and will not be specific to mutexes. Making the change now ensures that turnstiles can be dropped in at a later date without affecting the ABI of userland applications.
# 104379	02-Oct-2002	archie	Let kse_wakeup() take a KSE mailbox pointer argument. Reviewed by: julian
# 104354	02-Oct-2002	scottl	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb
# 104306	01-Oct-2002	jmallett	Back our kernel support for reliable signal queues. Requested by: rwatson, phk, and many others
# 104245	30-Sep-2002	jmallett	(Forced commit, to clarify previous commit of ksiginfo/signal queue code.) I've added a structure, kernel-private, to represent a pending or in-delivery signal, called `ksiginfo'. It is roughly analogous to the basic information that is exported by the POSIX interface 'siginfo_t', but more basic. I've added functions to allocate these structures, and further to wrap all signal operations using them. Once the operations are wrapped, I've added a TailQ (see queue(3)) of these structures to 'struct proc', and all pending signals are in that TailQ. When a signal is being delivered, it is dequeued from the list. Once I finish the spreading of ksiginfo throughout the tree, the dequeued structure will be delivered to the process in question, whereas currently and normally, the signal number is what is used.
# 104233	30-Sep-2002	jmallett	First half of implementation of ksiginfo, signal queues, and such. This gets signals operating based on a TailQ, and is good enough to run X11, GNOME, and do job control. There are some intricate parts which could be more refined to match the sigset_t versions, but those require further evaluation of directions in which our signal system can expand and contract to fit our needs. After this has been in the tree for a while, I will make in kernel API changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo more robustly, such that we can actually pass information with our (queued) signals to the userland. That will also result in using a struct ksiginfo pointer, rather than a signal number, in a lot of kern_sig.c, to refer to an individual pending signal queue member, but right now there is no defined behaviour for such. CODAFS is unfinished in this regard because the logic is unclear in some places. Sponsored by: New Gold Technology Reviewed by: bde, tjr, jake [an older version, logic similar]
# 104157	29-Sep-2002	julian	Implement basic KSE loaning. This stops a hread that is blocked in BOUND mode from stopping another thread from completing a syscall, and this allows it to release its resources etc. Probably more related commits to follow (at least one I know of) Initial concept by: julian, dillon Submitted by: davidxu
# 104082	28-Sep-2002	julian	Rewrite the kse_create() function to better aproach the semantics we have specified in the design.
# 103972	25-Sep-2002	archie	Make the following name changes to KSE related functions, etc., to better represent their purpose and minimize namespace conflicts: kse_fn_t -> kse_func_t struct thread_mailbox -> struct kse_thr_mailbox thread_interrupt() -> kse_thr_interrupt() kse_yield() -> kse_release() kse_new() -> kse_create() Add missing declaration of kse_thr_interrupt() to <sys/kse.h>. Regenerate the various generated syscall files. Minor style fixes. Reviewed by: julian
# 103858	23-Sep-2002	julian	oops don't do dthe copy range in a new KSE. There isn't one any more.
# 103835	23-Sep-2002	julian	Add code to create > 1 KSe per process. (support code not yet complete) Submitted by: davidxu
# 103410	16-Sep-2002	mini	Add kernel support needed for the KSE-aware libpthread: - Use ucontext_t's to store KSE thread state. - Synthesize state for the UTS upon each upcall, rather than saving and copying a trapframe. - Deliver signals to KSE-aware processes via upcall. - Rename kse mailbox structure fields to be more BSD-like. - Store the UTS's stack in struct proc in a stack_t. Reviewed by: bde, deischen, julian Approved by: -arch
# 103367	15-Sep-2002	julian	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
# 103216	11-Sep-2002	julian	Completely redo thread states. Reviewed by: davidxu@freebsd.org
# 103083	07-Sep-2002	peter	Make UAREA_PAGES and KSTACK_PAGES visible to userland via sysctl, like PS_STRINGS and USRSTACK is. This is necessary in order to decode a.out core dumps. kern_proc.c was already referring to both of these values but was missing the #include "opt_kstack_pages.h". Make the sysctl variables visible so that certain kld modules can see how their parent kernel was configured.
# 103014	06-Sep-2002	rwatson	Minor spelling tweak: assume "his" is actually "This".
# 103002	06-Sep-2002	julian	Use UMA as a complex object allocator. The process allocator now caches and hands out complete process structures including substructures . i.e. it get's the process structure with the first thread (and soon KSE) already allocated and attached, all in one hit. For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is unused and in the process cache. This saves having to allocate and attach it later, effectively bringing us (hopefully) close to the efficiency of pre-KSE systems where these were a single structure. Reviewed by: davidxu@freebsd.org, peter@freebsd.org
# 101677	11-Aug-2002	schweikh	Fix typos; each file has at least one s/seperat/separat/ (I skipped those in contrib/, gnu/ and crypto/) While I was at it, fixed a lot more found by ispell that I could identify with certainty to be errors. All of these were in comments or text, not in actual code. Suggested by: bde MFC after: 3 days
# 100831	28-Jul-2002	truckman	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.
# 99942	14-Jul-2002	julian	Thinking about it I came to the conclusion that the KSE states were incorrectly formulated. The correct states should be: IDLE: On the idle KSE list for that KSEG RUNQ: Linked onto the system run queue. THREAD: Attached to a thread and slaved to whatever state the thread is in. This means that most places where we were adjusting kse state can go away as it is just moving around because the thread is.. The only places we need to adjust the KSE state is in transition to and from the idle and run queues. Reviewed by: jhb@freebsd.org
# 99559	07-Jul-2002	peter	Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still. While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.
# 99124	30-Jun-2002	julian	If the process is a zombie, then you must not try dereference the thread because there isn't one. Of course this code only possibly works for single threaded processes anyhow..
# 99072	29-Jun-2002	julian	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 98609	22-Jun-2002	mini	Always drop the p_args reference we held for copyout, even if we're about to change it. This fixes a leak triggered by setproctitle(3). Approved by: alfred Noticed by: Peter Jeremy <peter.jeremy@alcatel.com.au>
# 97997	07-Jun-2002	jhb	Properly lock accesses to p_tracep and p_traceflag. Also make a few ktrace-only things #ifdef KTRACE that were not before.
# 96886	18-May-2002	jhb	Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.
# 96122	06-May-2002	alfred	Make funsetown() take a 'struct sigio **' so that the locking can be done internally. Ensure that no one can fsetown() to a dying process/pgrp. We need to check the process for P_WEXIT to see if it's exiting. Process groups are already safe because there is no such thing as a pgrp zombie, therefore the proctree lock completely protects the pgrp from having sigio structures associated with it after it runs funsetownlst. Add sigio lock to witness list under proctree and allproc, but over proc and pgrp. Seigo Tanimura helped with this.
# 95973	03-May-2002	tanimura	As malloc(9) and free(9) are now Giant-free, remove the Giant lock across malloc(9) and free(9) of a pgrp or a session.
# 95969	03-May-2002	tanimura	Fix the lock order reversal between the sigio lock and a process/pgrp lock in funsetownlst() by locking the sigio lock across funsetownlst().
# 95352	24-Apr-2002	tanimura	Free(9) should be Giant-free. Suggested by: jhb
# 95123	20-Apr-2002	tanimura	Push down Giant for setpgid(), setsid() and aio_daemon(). Giant protects only malloc(9) and free(9).
# 94857	16-Apr-2002	jhb	- Merge the pgrpsess_lock and proctree_lock sx locks into one proctree_lock sx lock. Trying to get the lock order between these locks was getting too complicated as the locking in wait1() was being fixed. - leavepgrp() now requires an exclusive lock of proctree_lock to be held when it is called. - fixjobc() no longer gets a shared lock of proctree_lock now that it requires an xlock be held by the caller. - Locking notes in sys/proc.h are adjusted to note that everything that used to be protected by the pgrpsess_lock is now protected by the proctree_lock.
# 94307	09-Apr-2002	jhb	- Change fill_kinfo_proc() to require that the process is locked when it is called. - Change sysctl_out_proc() to require that the process is locked when it is called and to drop the lock before it returns. If this proves too complex we can change sysctl_out_proc() to simply acquire the lock at the very end and have the calling code drop the lock right after it returns. - Lock the process we are going to export before the p_cansee() in the loop in sysctl_kern_proc() and hold the lock until we call sysctl_out_proc(). - Don't call p_cansee() on the process about to be exported twice in the aforementioned loop.
# 93942	06-Apr-2002	jake	Use CTASSERT rather than a runtime check to detect kinfo_proc size changes. Remove the ugly yuck code to busy wait for 20 seconds.
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 93607	01-Apr-2002	dillon	Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter() and cpu_critical_exit() and moves associated critical prototypes into their own header file, <arch>/<arch>/critical.h, which is only included by the three MI source files that need it. Backout and re-apply improperly comitted syntactical cleanups made to files that were still under active development. Backout improperly comitted program structure changes that moved localized declarations to the top of two procedures. Partially re-apply one of the program structure changes to move 'mask' into an intermediate block rather then in three separate sub-blocks to make the code more readable. Re-integrate bug fixes that Jake made to the sparc64 code. Note: In general, developers should not gratuitously move declarations out of sub-blocks. They are where they are for reasons of structure, grouping, readability, compiler-localizability, and to avoid developer-introduced bugs similar to several found in recent years in the VFS and VM code. Reviewed by: jake
# 93471	31-Mar-2002	alfred	Close some holes with p->p_args by NULL'ing out the p->p_args pointer while holding the proc lock, and by holding the pargs structure when accessing it from outside of the owner. Submitted by: Jonathan Mini <mini@haikugeek.com>
# 93348	28-Mar-2002	alfred	To remove nested include of sys/lock.h and sys/mutex.h from sys/proc.h make the pargs_* functions into non-inlines in kern/kern_proc.c. Requested by: bde
# 93295	27-Mar-2002	alfred	Make the reference counting of 'struct pargs' SMP safe. There is still some locations where the PROC lock should be held in order to prevent inconsistent views from outside (like the proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be fixed later. Submitted by: Jonathan Mini <mini@haikugeek.com>
# 93273	27-Mar-2002	jeff	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb
# 93272	27-Mar-2002	dillon	oops, forgot to commit this. td->td_savecrit = 0 replaced by API call cpu_thread_link().
# 93269	27-Mar-2002	jake	Make this compile. Pointy hat to: dillon
# 93076	24-Mar-2002	bde	Fixed some style bugs in the removal of __P(()). The main ones were not removing tabs before "__P((", and not outdenting continuation lines to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting and/or rewrap the whole prototype in some cases.
# 92751	20-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
# 92723	19-Mar-2002	alfred	Remove __P.
# 91140	23-Feb-2002	tanimura	Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
# 91066	22-Feb-2002	phk	Convert p->p_runtime and PCPU(switchtime) to bintime format.
# 90999	20-Feb-2002	julian	Oops, used wrong error value for unimplemented syscalls.
# 90889	19-Feb-2002	julian	Add stub syscalls and definitions for KSE calls. "Book'em Danno"
# 90558	12-Feb-2002	alc	The previous commit included a change to fill_kinfo_proc() that results in a NULL pointer dereference. Repair this mistake.
# 90538	11-Feb-2002	julian	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 90381	08-Feb-2002	peter	Fix a fatal trap when using ksched_setscheduler() (eg: mozilla, netscape etc) which use: td->td_last_kse->ke_flags \|= KEF_NEEDRESCHED;
# 90378	07-Feb-2002	julian	remove superfluous blank line
# 90375	07-Feb-2002	peter	Fix a couple of style bugs introduced (or touched by) previous commit.
# 90361	07-Feb-2002	julian	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# 88927	05-Jan-2002	jhb	Fix a bug where the mutex name wasn't always displayed for processes in SMTX in utils such as ps and top. The KI_CTTY flag was assigned to kinfo_proc->ki_kiflag rather than or'd into the flag, thus clobbering any flags set earlier, including KI_MTXBLOCK. Prodding by: peter
# 86324	13-Nov-2001	jhb	As a followup to the previous fixes to inferior, revert some of the changes in 1.80 that were needed for locking that are no longer needed now that a lock is simply asserted. Submitted by: bde
# 86304	12-Nov-2001	jhb	Clean up breakage in inferior() I introduced in 1.92 of kern_proc.c: - Restore inferior() to being iterative rather than recursive. - Assert that the proctree_lock is held in inferior() and change the one caller to get a shared lock of it. This also ensures that we hold the lock after performing the check so the check can't be made invalid out from under us after the check but before we act on it. Requested by: bde
# 84736	09-Oct-2001	rwatson	- Combine kern.ps_showallprocs and kern.ipc.showallsockets into a single kern.security.seeotheruids_permitted, describes as: "Unprivileged processes may see subjects/objects with different real uid" NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is an API change. kern.ipc.showallsockets does not. - Check kern.security.seeotheruids_permitted in cr_cansee(). - Replace visibility calls to socheckuid() with cr_cansee() (retain the change to socheckuid() in ipfw, where it is used for rule-matching). - Remove prison_unpcb() and make use of cr_cansee() against the UNIX domain socket credential instead of comparing root vnodes for the UDS and the process. This allows multiple jails to share the same chroot() and not see each others UNIX domain sockets. - Remove unused socheckproc(). Now that cr_cansee() is used universally for socket visibility, a variety of policies are more consistently enforced, including uid-based restrictions and jail-based restrictions. This also better-supports the introduction of additional MAC models. Reviewed by: ps, billf Obtained from: TrustedBSD Project
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 83281	10-Sep-2001	peter	Add on UPAGES to ki_rssize since it is there as result of the process and can be swapped out with the process.
# 81804	16-Aug-2001	peter	Fix part of another problem that bde pointed out. This is different to what bde suggested though.
# 81800	16-Aug-2001	peter	Remove redundant null-termination. The buffer is already explicitly zeroed, and we intentionally leave -1 on the strncpy length to leave the original \0. Submitted by: bde
# 81759	16-Aug-2001	peter	Use the backwards compatability mechanisms so that ps/top etc dont have unnecessary breakage. While here, use explicit sizes for the string fields so that we dont have unintentional changes again in the future when key tunables change. This still is not quite right, but a june userland is happy with a -current kernel with these tweaks.
# 79335	05-Jul-2001	rwatson	o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx(). The p_can(...) construct was a premature (and, it turns out, awkward) abstraction. The individual calls to p_canxxx() better reflect differences between the inter-process authorization checks, such as differing checks based on the type of signal. This has a side effect of improving code readability. o Replace direct credential authorization checks in ktrace() with invocation of p_candebug(), while maintaining the special case check of KTR_ROOT. This allows ktrace() to "play more nicely" with new mandatory access control schemes, as well as making its authorization checks consistent with other "debugging class" checks. o Eliminate "privused" construct for p_can*() calls which allowed the caller to determine if privilege was required for successful evaluation of the access control check. This primitive is currently unused, and as such, serves only to complicate the API. Approved by: ({procfs,linprocfs} changes) des Obtained from: TrustedBSD Project
# 78519	20-Jun-2001	jhb	Fix some lock order reversals where we called free() while holding a proc lock. We now use temporary variables to save the process argument pointer and just update the pointer while holding the lock. We then perform the free on the cached pointer after releasing the lock.
# 77183	25-May-2001	rwatson	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
# 76166	01-May-2001	markm	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# 75893	23-Apr-2001	jhb	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake
# 74927	28-Mar-2001	jhb	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# 74877	27-Mar-2001	dwmalone	Don't leak the memory we've just malloced if we can't find the process we're looking for. (I don't think this can currently happen, but it depends how the function is called). PR: 25932 Submitted by: David Xu <davidx@viasoft.com.cn>
# 73941	07-Mar-2001	mckusick	Bitch more loudly when someone botches changes to kinfo_proc in the hopes that they will actually read the comment above it and follow the instructions so as to cause all the rest of us less a lot less grief.
# 73927	07-Mar-2001	jhb	Proc locking including using proc lock in place of proctree where appropriate and locking processes while we signal them.
# 72786	21-Feb-2001	rwatson	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
# 72376	11-Feb-2001	jake	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# 72250	09-Feb-2001	jhb	Work around some sizeof(long) != sizeof(int) bogons.
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 71577	24-Jan-2001	jhb	Add a new item to kinfo_proc: ki_sflag to mirror p_sflag.
# 71561	24-Jan-2001	jhb	- Proc locking. - Catch up to proc flag changes. - Reorder the way we get things in fill_kinfoproc() to minimize the number of locking operations.
# 71003	13-Jan-2001	jhb	- Use sched_lock to prevent the mutex name from changing out from under us while we are copying it to the kinfo_proc structure. - Test against p_stat to see if we are blocked on a mutex. - Terminate ki_mtxname with a null char rather than ki_wmesg.
# 70317	23-Dec-2000	jake	Protect proc.p_pptr and proc.p_children/p_sibling with the proctree_lock. linprocfs not locked pending response from informal maintainer. Reviewed by: jhb, -smp@
# 69947	12-Dec-2000	jake	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# 69896	12-Dec-2000	mckusick	Change the proc information returned from the kernel so that it no longer contains kernel specific data structures, but rather only scalar values and structures that are already part of the kernel/user interface, specifically rusage and rtprio. It no longer contains proc, session, pcred, ucred, procsig, vmspace, pstats, mtx, sigiolst, klist, callout, pasleep, or mdproc. If any of these changed in size, ps, w, fstat, gcore, systat, and top would all stop working. The new structure has over 200 bytes of unassigned space for future values to be added, yet is nearly 100 bytes smaller per entry than the structure that it replaced.
# 69368	29-Nov-2000	jhb	Save a copy of p_mtxname in e_mtxname when creating an eproc.
# 69022	22-Nov-2000	jake	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# 65557	06-Sep-2000	jasone	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# 65495	05-Sep-2000	truckman	Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().
# 65300	31-Aug-2000	green	Casts are needed to subtract u_longs. Submitted by: tor
# 65237	30-Aug-2000	rwatson	o Centralize inter-process access control, introducing: int p_can(p1, p2, operation, privused) which allows specification of subject process, object process, inter-process operation, and an optional call-by-reference privused flag, allowing the caller to determine if privilege was required for the call to succeed. This allows jail, kern.ps_showallprocs and regular credential-based interaction checks to occur in one block of code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL, and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a series of static function checks in kern_prot, which should not be invoked directly. o Commented out capabilities entries are included for some checks. o Update most inter-process authorization to make use of p_can() instead of manual checks, PRISON_CHECK(), P_TRESPASS(), and kern.ps_showallprocs. o Modify suser{,_xxx} to use const arguments, as it no longer modifies process flags due to the disabling of ASU. o Modify some checks/errors in procfs so that ENOENT is returned instead of ESRCH, further improving concealment of processes that should not be visible to other processes. Also introduce new access checks to improve hiding of processes for procfs_lookup(), procfs_getattr(), procfs_readdir(). Correct a bug reported by bp concerning not handling the CREATE case in procfs_lookup(). Remove volatile flag in procfs that caused apparently spurious qualifier warnigns (approved by bde). o Add comment noting that ktrace() has not been updated, as its access control checks are different from ptrace(), whereas they should probably be the same. Further discussion should happen on this topic. Reviewed by: bde, green, phk, freebsd-security, others Approved by: bde Obtained from: TrustedBSD Project
# 65198	29-Aug-2000	green	Remove any possibility of hiwat-related race conditions by changing the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and a u_long target to set it to. The whole thing is splnet(). This fixes a problem that jdp has been able to provoke.
# 65034	23-Aug-2000	ps	Add a sysctl which hides all process except those that belong to the user asking for the process list. Reviewed by: peter
# 62573	04-Jul-2000	phk	Previous commit changing SYSCTL_HANDLER_ARGS violated KNF. Pointed out by: bde
# 62454	03-Jul-2000	phk	Style police catches up with rev 1.26 of src/sys/sys/sysctl.h: Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources: -sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
# 61991	23-Jun-2000	dima	Fix typo (inT -> int)
# 61976	22-Jun-2000	alfred	fix races in the uidinfo subsystem, several problems existed: 1) while allocating a uidinfo struct malloc is called with M_WAITOK, it's possible that while asleep another process by the same user could have woken up earlier and inserted an entry into the uid hash table. Having redundant entries causes inconsistancies that we can't handle. fix: do a non-waiting malloc, and if that fails then do a blocking malloc, after waking up check that no one else has inserted an entry for us already. 2) Because many checks for sbsize were done as "test then set" in a non atomic manner it was possible to exceed the limits put up via races. fix: instead of querying the count then setting, we just attempt to set the count and leave it up to the function to return success or failure. 3) The uidinfo code was inlining and repeating, lookups and insertions and deletions needed to be in their own functions for clarity. Reviewed by: green
# 60938	26-May-2000	jake	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
# 60833	23-May-2000	jake	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
# 57062	08-Feb-2000	phk	Also allow non-rot processes to setproctitle() Submitted by: Paul Saab <paul@mu.org> Approved by: jkh
# 53709	26-Nov-1999	phk	Add a sysctl to control if argv is disclosed to the world: kern.ps_argsopen It defaults to 1 which means that all users can see all argvs in ps(1). Reviewed by: Warner
# 53518	21-Nov-1999	phk	Introduce the new function p_trespass(struct proc p1, struct proc p2) which returns zero or an errno depending on the legality of p1 trespassing on p2. Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one extra signal related check. Replace procfs.h:CHECKIO() macros with calls to p_trespass(). Only show command lines to process which can trespass on the target process.
# 53275	17-Nov-1999	peter	Add e_stats (p->p_stats, from struct user->u_stats) to eproc so it's fetchable via sysctl. This saves ps having to read the u-area for stats. Be sure to recompile libkvm, ps, w, top and the usual suspects.
# 53239	16-Nov-1999	phk	Introduce commandline caching in the kernel. This fixes some nasty procfs problems for SMP, makes ps(1) run much faster, and makes ps(1) even less dependent on /proc which will aid chroot and jails alike. To disable this facility and revert to previous behaviour: sysctl -w kern.ps_arg_cache_limit=0 For full details see the current@FreeBSD.org mail-archives.
# 53225	16-Nov-1999	phk	Commit the remaining part of PR14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
# 52465	24-Oct-1999	green	Remove a KASSERT() that has fulfilled its purpose. Note that it did cause problems by tripping on shutdown (reboot(), not the socket operation :). Cause is still uncertain, but the panic isn't really necessary here.
# 52070	09-Oct-1999	green	Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total usage limit.
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 50031	18-Aug-1999	peter	Run queue heads have moved to TAILQ's.
# 48867	17-Jul-1999	phk	Reverse the sense of a test, dev2udev() will be much cheaper than udev2dev().
# 47271	17-May-1999	phk	Use NOUDEV for udev_t's
# 47270	17-May-1999	dfr	Change the definition of e_tdev in struct kinfo_proc from dev_t to udev_t Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
# 47028	11-May-1999	phk	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
# 46568	06-May-1999	peter	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
# 46381	03-May-1999	billf	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
# 46155	28-Apr-1999	phk	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
# 44146	19-Feb-1999	luoqi	Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This is the preparation step for moving pmap storage out of vmspace proper. Reviewed by: Alan Cox <alc@cs.rice.edu> Matthew Dillion <dillon@apollo.backplane.com>
# 43311	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 43208	26-Jan-1999	julian	Enable Linux threads support by default. This takes the conditionals out of the code that has been tested by various people for a while. ps and friends (libkvm) will need a recompile as some proc structure changes are made. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
# 42612	13-Jan-1999	julian	Re-enable the options in ps(1) that were disabled with the Linux threads support. Submitted by: "Richard Seaman, Jr." <dick@tar.com>
# 42453	09-Jan-1999	eivind	KNFize, by bde.
# 42408	08-Jan-1999	eivind	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
# 41087	11-Nov-1998	truckman	I got another batch of suggestions for cosmetic changes from bde.
# 41086	11-Nov-1998	truckman	Installed the second patch attached to kern/7899 with some changes suggested by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments. This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR. Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w. PR: kern/7899 Reviewed by: bde, elvind
# 41038	09-Nov-1998	truckman	If the session leader dies, s_leader is set to NULL and getsid() may dereference a NULL pointer, causing a panic. Instead of following s_leader to find the session id, store it in the session structure. Jukka found the following info: BTW - I just found what I have been looking for. Std 1003.1 Part 1: SYSTEM API [C LANGUAGE] section 2.2.2.80 states quite explicitly... Session lifetime: The period between when a session is created and the end of lifetime of all the process groups that remain as members of the session. So, this quite clearly tells that while there is any single process in any process group which is a member of the session, the session remains as an independent entity. Reviewed by: peter Submitted by: "Jukka A. Ukkonen" <jau@jau.tmt.tele.fi>
# 37555	11-Jul-1998	bde	Fixed printf format errors.
# 33680	20-Feb-1998	bde	Staticized. Don't depend on "implicit int".
# 33181	09-Feb-1998	eivind	Staticize.
# 33134	06-Feb-1998	eivind	Back out DIAGNOSTIC changes.
# 33108	04-Feb-1998	eivind	Turn DIAGNOSTIC into a new-style option.
# 33009	02-Feb-1998	dyson	Return the vm_map in the eproc structure, so we can support more accurate VSZ display in PS.
# 32702	22-Jan-1998	dyson	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
# 30354	12-Oct-1997	phk	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 30309	11-Oct-1997	phk	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde
# 27845	02-Aug-1997	bde	Removed unused #includes.
# 26991	27-Jun-1997	tegge	Fill in some extra fields in the eproc structure. gdb uses this information to determine where the data segment in core dumps should be mapped. Reviewed by: Peter Wemm <peter@spinner.dialix.com.au>
# 24203	24-Mar-1997	bde	Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include it when it is not used. In most cases, the reasons for including it went away when the special ioctl headers became self-sufficient.
# 22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 18297	14-Sep-1996	bde	Attached simple external ddb commands `show rtc', `show pgrpdump' and `show cbstat'. The pgrpdump code was previously controlled by `#ifdef DEBUG'.
# 17040	09-Jul-1996	wollman	Quiet a couple of -Wunused warnings.
# 16322	12-Jun-1996	gpalmer	Clean up -Wunused warnings. Reviewed by: bde
# 16160	06-Jun-1996	phk	Fix the same problem that davidg fixed in -stable some days ago and restructure sysctl stuff a bit. KERN_PROC_PID now uses pfind().
# 15985	29-May-1996	dg	Fix a panic caused by (proc)->p_session being dereferenced for a process that was exiting.
# 15110	07-Apr-1996	bde	Declared pgrpdump() properly.
# 14529	11-Mar-1996	hsu	From Lite2: proc LIST changes. Reviewed by: david & bde
# 13154	01-Jan-1996	peter	fill in kinfo_eproc.e_login - otherwise a sysctl to read the eprocs wont get the login names, and "ps -ax -O login" will return an empty column under the login name.
# 12819	14-Dec-1995	phk	A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
# 12662	07-Dec-1995	dg	Untangled the vm.h include file spaghetti.
# 12577	02-Dec-1995	bde	Completed function declarations and/or added prototypes.
# 12281	14-Nov-1995	phk	Hmm, I seem to have got all my patches screwed up anyway. Too bad. this is where the proctable stuff went.
# 8876	30-May-1995	rgrimes	Remove trailing whitespace.
# 3485	09-Oct-1994	phk	Cosmetics. related to getting prototypes into view.
# 3451	09-Oct-1994	dg	Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
# 3291	02-Oct-1994	dg	"idle priority" support. Based on code from Henrik Vestergaard Draboel, but substantially rewritten by me.
# 3098	25-Sep-1994	phk	While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
# 2441	01-Sep-1994	dg	Realtime priority scheduling support. Submitted by: Henrik Vestergaard Draboel
# 2112	18-Aug-1994	wollman	Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
# 1817	02-Aug-1994	dg	Added $Id$
# 1549	25-May-1994	rgrimes	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# 1542	24-May-1994	rgrimes	This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
# 1541	24-May-1994	rgrimes	BSD 4.4 Lite Kernel Sources