Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_resource.c
Revision	Date	Author	Comments
# fdafd315	24-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Automated cleanup of cdefs and other formatting Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row. Remove /^#if.\n#endif.\n#include\s+<sys/cdefs.h>.\n/ Remove /\n+#include\s+<sys/cdefs.h>.\n+#if.\n#endif.\n+/ Remove /\n+#if.\n#endif.\n+/ Remove /^#if.\n#endif.\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/ Sponsored by: Netflix
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 6419ed7e	06-Jul-2023	Doug Moore <dougm@FreeBSD.org>	inline_fls: drop compile-time check HAVE_INLINE_FLSLL is #defined always. This change assumes that where __HAVE_INLINE_FLSLL is tested, the two leading underscores are a mistake, and that the code will be better for using the efficient flsll implementation. Reviewed by: markj, mhorne Differential Revision: https://reviews.freebsd.org/D40705
# bbe62559	19-Aug-2022	Mateusz Guzik <mjg@FreeBSD.org>	rlimit: line up with other clean up in thread_reap_domain NFC
# f2eb09b0	26-Jul-2022	Dimitry Andric <dim@FreeBSD.org>	Adjust function definitions in kern_resource.c to avoid clang 15 warnings With clang 15, the following -Werror warnings are produced: sys/kern/kern_resource.c:1212:10: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] lim_alloc() ^ void sys/kern/kern_resource.c:1365:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] uihashinit() ^ void This is because lim_alloc() and uihashinit() are declared with (void) argument lists, but defined with empty argument lists. Make the definitions match the declarations. MFC after: 3 days
# becaf643	14-Feb-2022	John Baldwin <jhb@FreeBSD.org>	Use vmspace->vm_stacktop in place of sv_usrstack in more places. Reviewed by: markj Obtained from: CheriBSD Differential Revision: https://reviews.freebsd.org/D34174
# 93288e24	01-Feb-2022	Mateusz Guzik <mjg@FreeBSD.org>	Employ thread_cow_synced in setrlimit In order to avoid proc lock/unlock on next kernel entry.
# 8a0cb04d	01-Feb-2022	Mateusz Guzik <mjg@FreeBSD.org>	Add lim_cowsync, similar to crcowsync
# 5a8413e7	17-Jan-2022	Mark Johnston <markj@FreeBSD.org>	setrlimit: Remove special handling for RLIMIT_STACK with a stack gap This will not be required with a forthcoming reimplementation of ASLR stack randomization. Moreover, this change was not sufficient to enable the use of a stack size limit smaller than the stack gap itself. PR: 260303 Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33704
# bacb140f	14-Jan-2022	Simon J. Gerraty <sjg@FreeBSD.org>	Ignore calcru: runtime went backwards for vm_guest VM's have little control over CPU speed, don't make matters worse by constantly spaming console. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33902
# a9545eed	09-Dec-2021	Florian Walpen <dev@submerge.ch>	Add idle priority scheduling privilege group to MAC/priority Add an idletime user group that allows non-root users to run processes with idle scheduling priority. Privileges are granted by a MAC policy in the mac_priority module. For this purpose, the kernel privilege PRIV_SCHED_IDPRIO was added to sys/priv.h (kernel module ABI change). Deprecate the system wide sysctl(8) knob security.bsd.unprivileged_idprio which lets any user run idle priority processes, regardless of context. While the knob is still working, it is marked as deprecated in the description and in the man pages. MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D33338
# a20a2450	09-Dec-2021	Florian Walpen <dev@submerge.ch>	Add PRIV_SCHED_IDPRIO The privilege allows the holder to assign idle priority type to thread or process. MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D33338
# 638c5fa8	29-Nov-2021	Brooks Davis <brooks@FreeBSD.org>	syscalls: normalize (get\|set)rlimit Declare normal <foo>_args structs rather than going out of the way to declare __<foo>_args. Reviewed by: kib, imp
# 889b56c8	13-Oct-2021	Dawid Gorecki <dgr@semihalf.com>	setrlimit: Take stack gap into account. Calling setrlimit with stack gap enabled and with low values of stack resource limit often caused the program to abort immediately after exiting the syscall. This happened due to the fact that the resource limit was calculated assuming that the stack started at sv_usrstack, while with stack gap enabled the stack is moved by a random number of bytes. Save information about stack size in struct vmspace and adjust the rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max, then the value is truncated to rlim_max. PR: 253208 Reviewed by: kib Obtained from: Semihalf Sponsored by: Stormshield MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D31516
# af29f399	28-Jul-2021	Dmitry Chagin <dchagin@FreeBSD.org>	umtx: Split umtx.h on two counterparts. To prevent umtx.h polluting by future changes split it on two headers: umtx.h - ABI header for userspace; umtxvar.h - the kernel staff. While here fix umtx_key_match style. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31248 MFC after: 2 weeks
# 0659df6f	12-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	vm_map_protect: allow to set prot and max_prot in one go. This prevents a situation where other thread modifies map entries permissions between setting max_prot, then relocking, then setting prot, confusing the operation outcome. E.g. you can get an error that is not possible if operation is performed atomic. Also enable setting rwx for max_prot even if map does not allow to set effective rwx protection. Reviewed by: brooks, markj (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28117
# fb8ab680	14-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	thread: batch resource limit free calls
# 5c5ca843	10-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	Allow rtprio_thread to operate on threads of any process This in particular unbreaks rtkit. The limitation was a leftover of previous state, to quote a comment: /* * Though lwpid is unique, only current process is supported * since there is no efficient way to look up a LWP yet. */ Long since then a global tid hash was introduced to remedy the problem. Permission checks still apply. Submitted by: greg_unrelenting.technology (Greg V) Differential Revision: https://reviews.freebsd.org/D27158
# 6fed89b1	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	kern: clean up empty lines in .c and .h files
# 3ff65f71	30-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	Remove duplicated empty lines from kern/*.c No functional changes.
# ca603bb1	12-Jan-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	dd kern_getpriority(), make Linuxulator use it. Reviewed by: kib, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22842
# 7a0ef283	12-Jan-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add kern_setpriority(), use it in Linuxulator. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22841
# 61a74c5c	15-Dec-2019	Jeff Roberson <jeff@FreeBSD.org>	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626
# f5526339	17-Mar-2019	Andrew Gallatin <gallatin@FreeBSD.org>	Fix a typo introduced in r344133 The line was misedited to change tt to st instead of changing ut to st. The use of st as the denominator in mul64_by_fraction() will lead to an integer divide fault in the intr proc (the process holding ithreads) where st will be 0. This divide by 0 happens after the total runtime for all ithreads exceeds 76 hours. Submitted by: bde
# 23e5e43c	14-Feb-2019	Bruce Evans <bde@FreeBSD.org>	Finish the fix for overflow in calcru1(). The previous fix was unnecessarily very slow up to 105 hours where the simple formula used previously worked, and unnecessarily slow by a factor of about 5/3 up to 388 days, and didn't work above 388 days. 388 days is not a long time, since it is a reasonable uptime, and for processes the times being calculated are aggregated over all threads, so with N CPUs running the same thread a runtime of 388 days is reachable after only 388 / N physical days. The PRs document overflow at 388 days, but don't try to fix it. Use the simple formula up to 76 hours. Then use a complicated general method that reduces to the simple formula up to a bit less than 105 hours, then reduces to the previous method without its extra work up to almost 388 days, then does more complicated reductions, usually many bits at a time so that this is not slow. This works up to half of maximum representable time (292271 years), with accumulated rounding errors of at most 32 usec. amd64 can do all this with no avoidable rounding errors in an inline asm with 2 instructions, but this is too special to use. __uint128_t can do the same with 100's of instructions on 64-bit arches. Long doubles with at least 64 bits of precision are the easiest method to use on i386 userland, but are hard to use in the kernel. PR: 76972 and duplicates Reviewed by: kib
# e0d164c7	10-Feb-2019	Conrad Meyer <cem@FreeBSD.org>	Prevent overflow for usertime/systime in caclru1 PR: 76972 and duplicates Reported by: Dr. Christopher Landauer <cal AT aero.org>, Steinar Haug <sthaug AT nethelp.no> Submitted by: Andrey Zonov <andrey AT zonov.org> (earlier version) MFC after: 2 weeks
# 73e62bc9	10-Dec-2018	Mateusz Guzik <mjg@FreeBSD.org>	Make lim_cur inline if possible. It is a function call only to accomodate some ABIs which install a hook. They only care for 3 types of limits: DATA, STACK, VMEM Instead of always calling the func, see at compilation time if the requested limit is something else and just do the read if so. Sponsored by: The FreeBSD Foundation
# 6ff4688b	07-Dec-2018	Mateusz Guzik <mjg@FreeBSD.org>	Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last Sponsored by: The FreeBSD Foundation
# e8bb589d	04-Oct-2018	Matt Macy <mmacy@FreeBSD.org>	eliminate locking surrounding ui_vmsize and swap reserve by using atomics Change swap_reserve and swap_total to be in units of pages so that swap reservations can be done using only atomics instead of using a single global mutex for swap_reserve and a single mutex for all processes running under the same uid for uid accounting. Results in mmap speed up and a 70% increase in brk calls / second. Reviewed by: alc@, markj@, kib@ Approved by: re (delphij@) Differential Revision: https://reviews.freebsd.org/D16273
# 6469bdcd	06-Apr-2018	Brooks Davis <brooks@FreeBSD.org>	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
# 2da93c21	04-Jan-2018	John Baldwin <jhb@FreeBSD.org>	Always use atomic_fetchadd() when updating per-user accounting values. This avoids re-reading a variable after it has been updated via an atomic op. It is just a cosmetic cleanup as the read value was only used to control a diagnostic printf that should rarely occur (if ever). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D13768
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 5949c7e5	31-Oct-2017	Mateusz Guzik <mjg@FreeBSD.org>	Save on uihash table locking by checking if the caller already uses the struct In particular with poudriere this saves about 90% of lookups.
# 3e85b721	16-May-2017	Ed Maste <emaste@FreeBSD.org>	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193
# 69a28758	15-Sep-2016	Ed Maste <emaste@FreeBSD.org>	Renumber license clauses in sys/kern to avoid skipping #3
# 1bdbd705	28-Feb-2016	Konstantin Belousov <kib@FreeBSD.org>	Implement process-shared locks support for libthr.so.3, without breaking the ABI. Special value is stored in the lock pointer to indicate shared lock, and offline page in the shared memory is allocated to store the actual lock. Reviewed by: vangyzen (previous version) Discussed with: deischen, emaste, jhb, rwatson, Martin Simmons <martin@lispworks.com> Tested by: pho Sponsored by: The FreeBSD Foundation
# 20b4c1d2	22-Dec-2015	Enji Cooper <ngie@FreeBSD.org>	Fold lim_shared into lim_copy to mute a -Wunused compiler warning from clang when the kernel is compiled without INVARIANTS Differential Revision: https://reviews.freebsd.org/D4683 Reviewed by: kib, jhb MFC after: 1 week Sponsored by: EMC / Isilon Storage Division
# 15db3c07	14-Nov-2015	Edward Tomasz Napierala <trasz@FreeBSD.org>	Speed up rctl operation with large rulesets, by holding the lock during iteration instead of relocking it for each traversed rule. Reviewed by: mjg@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D4110
# cd672ca6	16-Jul-2015	Mateusz Guzik <mjg@FreeBSD.org>	Get rid of lim_update_thread and cred_update_thread. Their primary use was in thread_cow_update to free up old resources. Freeing had to be done with proc lock held and _cow_ funcs already knew how to free old structs.
# 7150ce74	24-Jun-2015	Mateusz Guzik <mjg@FreeBSD.org>	rlimit: deduplicate code in chg* functions
# f6f6d240	10-Jun-2015	Mateusz Guzik <mjg@FreeBSD.org>	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
# 316b3843	15-Apr-2015	Konstantin Belousov <kib@FreeBSD.org>	Implement support for binary to requesting specific stack size for the initial thread. It is read by the ELF image activator as the virtual size of the PT_GNU_STACK program header entry, and can be specified by the linker option -z stack-size in newer binutils. The soft RLIMIT_STACK is auto-increased if possible, to satisfy the binary' request. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 5c7bebf9	26-Nov-2014	Konstantin Belousov <kib@FreeBSD.org>	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# dff9862c	23-Nov-2014	Mateusz Guzik <mjg@FreeBSD.org>	ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct Change racct_ create and destroy to macros evaluating to nothing without RACCT so that their callers passing ui_racct don't have to be ifdefed.
# 8d866b10	27-Oct-2014	Mateusz Guzik <mjg@FreeBSD.org>	Tidy up functions related to uidinfo management. - reference found uidinfo in uilookup - reduce nesting by handling shorter cases first
# 8101958c	27-Oct-2014	Mateusz Guzik <mjg@FreeBSD.org>	De-k&r-ify function definitions in kern/kern_resource.c No functional changes.
# 675c3507	24-Oct-2014	Mateusz Guzik <mjg@FreeBSD.org>	rlimit: plug duplicate assertion counter sanity is already checked by refcount_release.
# c2a48c0d	13-Dec-2013	Mateusz Guzik <mjg@FreeBSD.org>	rlimit: avoid unnecessary copying of rlimits If refcount is 1 just modify rlimits in place. MFC after: 2 weeks
# 3318a9c8	13-Dec-2013	Mateusz Guzik <mjg@FreeBSD.org>	rlimit: add and utilize lim_shared MFC after: 2 weeks
# 9110db81	21-Oct-2013	Konstantin Belousov <kib@FreeBSD.org>	Add a resource limit for the total number of kqueues available to the user. Kqueue now saves the ucred of the allocating thread, to correctly decrement the counter on close. Under some specific and not real-world use scenario for kqueue, it is possible for the kqueues to consume memory proportional to the square of the number of the filedescriptors available to the process. Limit allows administrator to prevent the abuse. This is kernel-mode side of the change, with the user-mode enabling commit following. Reported and tested by: pho Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 9a2bff7c	06-Mar-2013	Ian Lepore <ian@FreeBSD.org>	Call sched_prio() to immediately change the priority of the thread in response to an rtprio_thread() call, when the priority is different than the old priority, and either the old or the new priority class is not RTP_PRIO_NORMAL (timeshare). The reasoning for the second half of the test is that if it's a change in timeshare priority, then the scheduler is going to adjust that priority in a way that completely wipes out the requested change anyway, so what's the point? (If that's not true, then allowing a thread to change its own timeshare priority would subvert the scheduler's adjustments and let a cpu-bound thread monopolize the cpu; if allowed at all, that should require priveleges.) On the other hand, if either the old or new priority class is not timeshare, then the scheduler doesn't make automatic adjustments, so we should honor the request and make the priority change right away. The reason the old class gets caught up in this is the very reason for this change: when thread A changes the priority of its child thread B from idle back to timeshare, thread B never actually gets moved to a timeshare-range run queue unless there are some idle cycles available to allow it to first get scheduled again as an idle thread. Reviewed by: jhb@
# 4601bab1	04-Mar-2013	Davide Italiano <davide@FreeBSD.org>	MFcalloutng (r244251 with minor changes): Specify that precision of 0.5s is enough for resource limitation. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
# 8854fe39	22-Jan-2012	Mikolaj Golub <trociny@FreeBSD.org>	Change kern.proc.rlimit sysctl to: - retrive only one, specified limit for a process, not the whole array, as it was previously (the sysctl has been added recently and has not been backported to stable yet, so this change is ok); - allow to set a resource limit for another process. Submitted by: Andrey Zonov <andrey at zonov.org> Discussed with: kib Reviewed by: kib MFC after: 2 weeks
# 948c4609	05-Jan-2012	John Baldwin <jhb@FreeBSD.org>	Fix a logic bug in change 228207 in the check for a thread's new user priority being a realtime priority. MFC after: 3 days
# 9910b854	13-Dec-2011	Eitan Adler <eadler@FreeBSD.org>	- Add a sysctl to allow non-root users the ability to set idle priorities. - While here fix up some style nits. Discussed with: cperciva (breifly) Reviewed by: pjd (earlier version) Reviewed by: bde Approved by: jhb MFC after: 1 month
# 593dd43e	02-Dec-2011	John Baldwin <jhb@FreeBSD.org>	When changing the user priority of a thread, change the real priority in addition to the user priority for threads whose current real priority is equal to the previous user priority or if the new priority is a real-time priority. This allows priority changes of other threads to have an immediate effect. MFC after: 2 weeks
# bde886fb	07-Nov-2011	Mikolaj Golub <trociny@FreeBSD.org>	In lim_fork() assert that processes locks are held. Suggested by: kib
# 8451d0dd	16-Sep-2011	Kip Macy <kmacy@FreeBSD.org>	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 2417d97e	18-Jul-2011	John Baldwin <jhb@FreeBSD.org>	- Export each thread's individual resource usage in in struct kinfo_proc's ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the process sysctls. - Correctly account for the current thread's cputime in the thread when doing the runtime fixup in calcru(). - Use TIDs as the key to lookup the previous thread to compute IO stat deltas in IO mode in top when thread display is enabled. Reviewed by: kib Approved by: re (kib)
# e806d352	06-Apr-2011	John Baldwin <jhb@FreeBSD.org>	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week
# 097055e2	29-Mar-2011	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
# 8e6fa660	24-Mar-2011	John Baldwin <jhb@FreeBSD.org>	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week
# c8e368a9	29-Dec-2010	David Xu <davidxu@FreeBSD.org>	- Follow r216313, the sched_unlend_user_prio is no longer needed, always use sched_lend_user_prio to set lent priority. - Improve pthread priority-inherit mutex, when a contender's priority is lowered, repropagete priorities, this may cause mutex owner's priority to be lowerd, in old code, mutex owner's priority is rise-only.
# 36b4cd24	17-Dec-2010	John Baldwin <jhb@FreeBSD.org>	Add back a bounds check on valid idle priorities that was lost in an earlier commit. While here, move the thread lock down in rtp_to_pri(). It is not needed for all of the priority value checks and the computation of newpri. Reported by: swell.k @ gmail MFC after: 3 days
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# c4965cfc	18-Oct-2010	Ed Maste <emaste@FreeBSD.org>	We've already set p = td->td_proc, so use it.
# cf7d9a8c	08-Oct-2010	David Xu <davidxu@FreeBSD.org>	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian
# 1a996ed1	18-Jul-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	Revert r210225 - turns out I was wrong; the "/*-" is not license-only thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@
# 805cc58a	18-Jul-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright occurences from sys/sys/ and sys/kern/.
# eea4ac8b	18-Jul-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove outdated comment and move part of it into more applicable place.
# 60ae52f7	21-Jun-2010	Ed Schouten <ed@FreeBSD.org>	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.
# f3e1e28b	26-May-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r208488: Fix the double counting of the last process thread td_incruntime on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Approved by: re (kensmith)
# 41fd9c63	24-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Fix the double counting of the last process thread td_incruntime on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg() instead of ruxagg_locked() and use it from thread_exit(). Diagnosed and tested by: neel MFC after: 3 days
# c193de56	11-May-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r207468: Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility function ruxagg_tlock(). Convert the definition of kern_getrusage() to ANSI C. MFC r207602: Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks information for thread to allow calcru1() (re)use. Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1]. The ruxagg_locked() function no longer clears thread ticks nor td_incruntime. Not an MFC: the td_rux is added to the end of struct thread to keep the KBI. Explicit bzero() of td_rux is added to new thread initialization points.
# bed4c524	03-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks information for thread to allow calcru1() (re)use. Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1]. The ruxagg_locked() function no longer clears thread ticks nor td_incruntime. Requested by: attilio [1] Discussed with: attilio, bde Reviewed by: bde Based on submission by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week X-MFC-Note: td_rux shall be moved to the end of struct thread
# 3087dc40	01-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility function ruxagg_tlock(). Convert the definition of kern_getrusage() to ANSI C. Submitted by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week
# 9b355dc7	05-Apr-2010	Randall Stewart <rrs@FreeBSD.org>	MFC of 204670: ------------------------- sched_getparam was just plain broke for time-share processes. It did not return an error but instead just let garbage be passed back. This I fix so it actually properly translates the priority the process is at to a posix's high means more priority. I also fix it so that if the ULE scheduler has bumped it up to a realtime process you get back a sane value i.e. the highest priority (63 for time-share). sched_setscheduler() had the setting of the timeshare class priority disabled. With some notes about rejecting the posix high numbers is greater priority and use nice instead. This fix also adjusts that to work, with the cavet that a t-s process may well get bumped up or down i.e. the setscheduler() will NOT change the nice value only the current priority. I think this is reasonable considering if the user wants to play with nice then he can. At least all the posix'ish interfaces now respond sanely. -----------------------
# bec67fd3	03-Mar-2010	Randall Stewart <rrs@FreeBSD.org>	sched_getparam was just plain broke for time-share processes. It did not return an error but instead just let garbage be passed back. This I fix so it actually properly translates the priority the process is at to a posix's high means more priority. I also fix it so that if the ULE scheduler has bumped it up to a realtime process you get back a sane value i.e. the highest priority (63 for time-share). sched_setscheduler() had the setting of the timeshare class priority disabled. With some notes about rejecting the posix high numbers is greater priority and use nice instead. This fix also adjusts that to work, with the cavet that a t-s process may well get bumped up or down i.e. the setscheduler() will NOT change the nice value only the current priority. I think this is reasonable considering if the user wants to play with nice then he can. At least all the posix'ish interfaces now respond sanely. MFC after: 3 weeks
# 3364c323	23-Jun-2009	Konstantin Belousov <kib@FreeBSD.org>	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
# 300fa5ef	23-Oct-2008	David Xu <davidxu@FreeBSD.org>	Don't rearm callout if the process is exiting, it may leak a callout because callout_drain() only waits for running callout, but not disable it if it is rearmed.
# 1ede983c	23-Oct-2008	Dag-Erling Smørgrav <des@FreeBSD.org>	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# c27991e8	05-Sep-2008	Ed Schouten <ed@FreeBSD.org>	Fix a small typo in a comment in calcru1(). The word "happene" should read "happened". Submitted by: Jille Timmermans <jille quis cx>
# bc093719	20-Aug-2008	Ed Schouten <ed@FreeBSD.org>	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
# 4682cd0b	19-Mar-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Remove extra uihold() call that accidentally sneak in during perforce change @125544.
# 374ae2a3	19-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
# 709446e7	16-Mar-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Whitespace cleanups.
# 1b072fbc	16-Mar-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	- Use wait-free method to manage ui_sbsize and ui_proccnt fields in the uidinfo structure. This entirely removes contention observed on the ui_mtxp mutex (as it is now gone). - Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just need to read-lock it. Reviewed by: jhb, jeff, kris & others Tested by: kris
# e0567707	16-Mar-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Style fixes.
# 67e83b07	16-Mar-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Fix information leak. We can find PIDs of running processes from within a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and checking the return value: 0 means that the process exists and -1 that it doesn't exist. Reviewed by: rwatson MFC after: 1 week
# 6617724c	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# d92909c1	10-Jan-2008	Robert Watson <rwatson@FreeBSD.org>	Don't zero td_runtime when billing thread CPU usage to the process; maintain a separate td_incruntime to hold unbilled CPU usage for the thread that has the previous properties of td_runtime. When thread information is requested using the thread monitoring sysctls, export thread td_runtime instead of process rusage runtime in kinfo_proc. This restores the display of individual ithread and other kernel thread CPU usage since inception in ps -H and top -SH, as well for libthr user threads, valuable debugging information lost with the move to try kthreads since they are no longer independent processes. There is universal agreement that we should rewrite the process and thread export sysctls, but this commit gets things going a bit better in the mean time. Likewise, there are resevations about the continued validity of statclock given the speed of modern processors. Reviewed by: attilio, emaste, jhb, julian
# 435806d3	11-Dec-2007	David Xu <davidxu@FreeBSD.org>	Fix LOR of thread lock and umtx's priority propagation mutex due to the reworking of scheduler lock. MFC: after 3 days
# fb62eea2	16-Jul-2007	Jeff Roberson <jeff@FreeBSD.org>	- Use ruxagg() in calcru() to make sure we have current tick information from all threads. Discussed with: bde, attilio Approved by: re
# 59d8f3ff	12-Jul-2007	John Baldwin <jhb@FreeBSD.org>	Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)
# 7e273744	14-Jun-2007	Robert Watson <rwatson@FreeBSD.org>	Remove the restriction that rtprio(2) cannot be used to set the realtime or idle priority of another process owned by the same user. This means that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly via p_cansched(9) or directly to set realtime/idle privilege, rather than directly affecting target process authorization.
# 32f9753c	11-Jun-2007	Robert Watson <rwatson@FreeBSD.org>	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
# a1fe14bc	09-Jun-2007	Attilio Rao <attilio@FreeBSD.org>	rufetch and calcru sometimes should be called atomically together. This patch fixes places where they should be called atomically changing their locking requirements (both assume per-proc spinlock held) and introducing rufetchcalc which wrappers both calls to be performed in atomic way. Reviewed by: jeff Approved by: jeff (mentor)
# a140976e	09-Jun-2007	Attilio Rao <attilio@FreeBSD.org>	The current rusage code show peculiar problems: - Unsafeness on ruadd() in thread_exit() - Unatomicity of thread_exiit() in the exit1() operations This patch addresses these problems allocating p_fd as part of the process and modifying the way it is accessed. A small chunk of this patch, resolves a race about p_state in kern_wait(), since we have to be sure about the zombif-ing process. Submitted by: jeff Approved by: jeff (mentor)
# 982d11f8	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 1c4bcd05	31-May-2007	Jeff Roberson <jeff@FreeBSD.org>	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
# e1e8f51b	27-May-2007	Robert Watson <rwatson@FreeBSD.org>	Universally adopt most conventional spelling of acquire.
# 19059a13	14-May-2007	John Baldwin <jhb@FreeBSD.org>	Rework the support for ABIs to override resource limits (used by 32-bit processes under 64-bit kernels). Previously, each 32-bit process overwrote its resource limits at exec() time. The problem with this approach is that the new limits affect all child processes of the 32-bit process, including if the child process forks and execs a 64-bit process. To fix this, don't ovewrite the resource limits during exec(). Instead, sv_fixlimits() is now replaced with a different function sv_fixlimit() which asks the ABI to sanitize a single resource limit. We then use this when querying and setting resource limits. Thus, if a 32-bit process sets a limit, then that new limit will be inherited by future children. However, if the 32-bit process doesn't change a limit, then a future 64-bit child will see the "full" 64-bit limit rather than the 32-bit limit. MFC is tentative since it will break the ABI of old linux.ko modules (no other modules are affected). MFC after: 1 week
# 873fbcd7	05-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
# 0c14ff0e	04-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
# 1ad9ee86	25-Feb-2007	Xin LI <delphij@FreeBSD.org>	Close race conditions between fork() and [sg]etpriority()'s PRIO_USER case, possibly also other places that deferences p_ucred. In the past, we insert a new process into the allproc list right after PID allocation, and release the allproc_lock sx. Because most content in new proc's structure is not yet initialized, this could lead to undefined result if we do not handle PRS_NEW with care. The problem with PRS_NEW state is that it does not provide fine grained information about how much initialization is done for a new process. By defination, after PRIO_USER setpriority(), all processes that belongs to given user should have their nice value set to the specified value. Therefore, if p_{start,end}copy section was done for a PRS_NEW process, we can not safely ignore it because p_nice is in this area. On the other hand, we should be careful on PRS_NEW processes because we do not allow non-root users to lower their nice values, and without a successful copy of the copy section, we can get stale values that is inherted from the uninitialized area of the process structure. This commit tries to close the race condition by grabbing proc mutex before we release allproc_lock xlock, and do copy as well as zero immediately after the allproc_lock xunlock. This guarantees that the new process would have its p_copy and p_zero sections, as well as user credential informaion initialized. In getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW process, we just skip the process in question, because it does not affect the final result of the call, as the p_nice value would be copied from its parent, and we will see it during allproc traverse. Other potential solutions are still under evaluation. Discussed with: davidxu, jhb, rwatson PR: kern/108071 MFC after: 2 weeks
# 86138fc7	19-Feb-2007	Robert Watson <rwatson@FreeBSD.org>	Use priv_check(9) instead of suser(9) for checking the privilege to set real-time priority on a thread. It looks like this suser(9) call was introduced after my first pass through replacing superuser checks with named privilege checks.
# 4f506694	17-Jan-2007	Xin LI <delphij@FreeBSD.org>	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.
# ad1e7d28	05-Dec-2006	Julian Elischer <julian@FreeBSD.org>	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# fa0d3a32	19-Nov-2006	David Xu <davidxu@FreeBSD.org>	Use scheduler API sched_user_prio() to adjust thread's userland priority, use td_base_user_prio to get real userland priority since POSIX priority mutex may adjust td_user_pri which is an effective priority.
# acd3428b	06-Nov-2006	Robert Watson <rwatson@FreeBSD.org>	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
# 8460a577	26-Oct-2006	John Birrell <jb@FreeBSD.org>	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 73fa3e5b	20-Sep-2006	David Xu <davidxu@FreeBSD.org>	Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam with rtprio_thread, while rtprio system call is for process only, the new system call rtprio_thread is responsible for LWP.
# 776fc0e9	04-Aug-2006	Yaroslav Tykhiy <ytykhiy@gmail.com>	Commit the results of the typo hunt by Darren Pilgrim. This change affects documentation and comments only, no real code involved. PR: misc/101245 Submitted by: Darren Pilgrim <darren pilgrim bitfreak org> Tested by: md5(1) MFC after: 1 week
# 272601f8	11-Mar-2006	Poul-Henning Kamp <phk@FreeBSD.org>	Go over calcru and friends once more. Reintroduce the monotonicity for the normal case and make the two special cases behave in what is belived to be the most sensible fasion.
# 0f038c05	09-Mar-2006	Poul-Henning Kamp <phk@FreeBSD.org>	Add slop to "backwards" cpu accounting messages, 3 usec or 1% whichever triggers. This should eliminate all the trivial messages which result from minor increases in cpu_tick frequency. Machines which don't du cpu clock fiddling shouldn't issue "backwards" messages now. Laptops and other machines where the initial estimate of cputicks may be waaaay off will still issue warnings.
# 8f95fc24	22-Feb-2006	John Baldwin <jhb@FreeBSD.org>	Various style and comment fixes. Submitted by: bde
# 6fc6433e	21-Feb-2006	John Baldwin <jhb@FreeBSD.org>	Split calcru() back into a calcru1() function shared with calccru() and a calcru() wrapper that passes a local rusage_ext on the stack that is a snapshot to do the calculations on. Now we can pass p->p_crux to calcru1() in calccru() again which fixes the issues with runtime going backwards messages when dead processes are harvested by init. Reviewed by: phk Tested by: Stefan Ehmann shoesoft at gmx dot net
# e8444a7e	11-Feb-2006	Poul-Henning Kamp <phk@FreeBSD.org>	CPU time accounting speedup (step 2) Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64
# 5b1a8eb3	07-Feb-2006	Poul-Henning Kamp <phk@FreeBSD.org>	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
# 6807424d	24-Jan-2006	Stephan Uphoff <ups@FreeBSD.org>	Back out changes made in rev. 1.151. They were bogus. Cluebat applied by: jhb@
# 03001f59	23-Jan-2006	Stephan Uphoff <ups@FreeBSD.org>	Hopefully fix the "calcru: runtime went backwards from ..." problem by keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days
# 1471f287	02-Nov-2005	Paul Saab <ps@FreeBSD.org>	Calling setrlimit from 32bit apps could potentially increase certain limits beyond what should be capiable in a 32bit process, so we must fixup the limits. Reviewed by: jhb
# b2149bde	27-Sep-2005	John Baldwin <jhb@FreeBSD.org>	Use the reference count API to manage the reference counts for process limit structures rather than using pool mutexes to protect the reference counts. Tested on: i386, alpha, sparc64
# f0e51320	01-Jun-2005	Alan Cox <alc@FreeBSD.org>	Giant is no longer required in kern_setrlimit(); remove its acquisition and release. Reviewed by: jhb
# 63710c4d	30-Dec-2004	John Baldwin <jhb@FreeBSD.org>	Stop explicitly touching td_base_pri outside of the scheduler and simply set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.
# 78c85e8d	05-Oct-2004	John Baldwin <jhb@FreeBSD.org>	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
# 6111dcd2	23-Sep-2004	John Baldwin <jhb@FreeBSD.org>	A modest collection of various and sundry style, spelling, and whitespace fixes. Submitted by: bde (mostly)
# 7eaec467	22-Sep-2004	John Baldwin <jhb@FreeBSD.org>	Various small style fixes.
# 5dd3a4ed	06-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not needed if we print the local variable version of the limit rather than the shared version.
# 1b93405c	04-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	Remove spl's from kern_resource.c.
# 56f21b9d	26-Jul-2004	Colin Percival <cperciva@FreeBSD.org>	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
# ba39a1c5	21-Jun-2004	Bruce Evans <bde@FreeBSD.org>	Turned off the "calcru: negative time" warning for certain SMP cases where it is known to detect a problem but the problem is not very easy to fix. The warning became very common recently after a call to calcru() was added to fill_kinfo_thread(). Another (much older) cause of "negative times" (actually non-monotonic times) was fixed in rev.1.237 of kern_exit.c. Print separate messages for non-monotonic and negative times.
# fa885116	15-Jun-2004	Julian Elischer <julian@FreeBSD.org>	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.
# 1930e303	11-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
# 60f798c1	08-May-2004	Julian Elischer <julian@FreeBSD.org>	Fix rtprio() to do sensible things when called from threaded processes. It's not quite correct from a posix Point Of view, but it is a lot better than what was there before. This will be revisited later when we decide what form our priority extensions will take. Posix doesn't specify how a system scope thread can change its priority so you need to add non-standard extensions to be able to do it.. For now make this slightly non standard to allow it to be done. Submitted by: Dan Eischen originally, changed by myself.
# 4ddd1e65	10-Apr-2004	Maxime Henrion <mux@FreeBSD.org>	Remove a comment that complains about the lack of %qd, to justify truncating a rlim_t to a long. We have %qd since some time now. However, the correct format to use here is %jd and a cast to intmax_t, so do this.
# 7f8a436f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# e7a44cac	11-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Argh! Fix a bogon. lim_cur() was returning the hard (max) limit rather than the soft (cur) limit. Submitted by: bde
# a875f385	06-Feb-2004	John Baldwin <jhb@FreeBSD.org>	- Convert the plimit lock to a pool mutex lock. - Hide struct plimit from userland. Submitted by: bde (2)
# f4daf056	06-Feb-2004	John Baldwin <jhb@FreeBSD.org>	- Correct the translation of old rlimit values to properly handle the old RLIM_INFINITY case for ogetrlimit(). - Use %jd and intmax_t to output negative time in usec in calcru(). - Rework getrusage() to make a copy of the rusage struct into a local variable while holding Giant and then do the copyout from the local variable to avoid having to have the original process rusage struct locked while doing the copyout (which would not be safe). This also includes a few style fixes from Bruce to getrusage(). Submitted by: bde (1, parts of 3) Suggested by: bde (2)
# 99b6e02b	06-Feb-2004	John Baldwin <jhb@FreeBSD.org>	A few more style fixes from Bruce including a few I missed last time. Submitted by: bde
# b4323d77	05-Feb-2004	John Baldwin <jhb@FreeBSD.org>	- A lot of style and whitespace fixes. - Update a few comments regarding locking notes. Submitted by: bde (1, mostly)
# 91d5354a	04-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
# eab9cabf	27-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Don't set td_priority directly here, use sched_prio().
# 857d9c60	12-Jul-2003	Don Lewis <truckman@FreeBSD.org>	Extend the mutex pool implementation to permit the creation and use of multiple mutex pools with different options and sizes. Mutex pools can be created with either the default sleep mutexes or with spin mutexes. A dynamically created mutex pool can now be destroyed if it is no longer needed. Create two pools by default, one that matches the existing pool that uses the MTX_NOWITNESS option that should be used for building higher level locks, and a new pool with witness checking enabled. Modify the users of the existing mutex pool to use the appropriate pool in the new implementation. Reviewed by: jhb
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 4d923fe3	23-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Remove Giant from [gs]etpriority().
# a15cc359	22-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Lock both the proc lock and sched_lock when calling sched_nice since kg_nice is now protected by both. Being protected by both means that other places in the kernel that want to read kg_nice only need one of the two locks.
# 08865ba1	18-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Add a couple of sched_lock asserts.
# f6f230fe	10-Apr-2003	Jeff Roberson <jeff@FreeBSD.org>	- Adjust sched hooks for fork and exec to take processes as arguments instead of ksegs since they primarily operation on processes. - KSEs take ticks so pass the kse through sched_clock(). - Add a sched_class() routine that adjusts a ksegrp pri class. - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp} that will be used to tell the scheduler about new instances of these structures within the same process. These will be used by THR and KSE. - Change sched_4bsd to reflect this API update.
# 262c27b8	12-Mar-2003	Tim J. Robbins <tjr@FreeBSD.org>	Back out previous. The locking here needs a rethink.
# a7cbe87a	12-Mar-2003	Tim J. Robbins <tjr@FreeBSD.org>	Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses to kg_nice and calls to sched_nice() in getpriority() and setpriority() (really donice()).
# 27e39ae4	19-Feb-2003	Tim J. Robbins <tjr@FreeBSD.org>	Remove the PL_SHAREMOD flag from struct plimit, which could have been used to share resource limits between rfork threads, but never was. Removing it makes resource limit locking much simpler -- only the current process can change the contents of the structure that p_limit points to.
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# e4625663	16-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into the proc. These counters are only examined through calcru. Submitted by: davidxu Tested on: x86, alpha, UP/SMP
# 5ce623b8	13-Feb-2003	Tim J. Robbins <tjr@FreeBSD.org>	Add an XXX comment noting that getrusage() accesses p_stats->p_ru and p_stats->p_cru without holding the appropriate locks.
# 6f8132a8	31-Jan-2003	Julian Elischer <julian@FreeBSD.org>	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# 0dbb100b	26-Jan-2003	David Xu <davidxu@FreeBSD.org>	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# b43179fb	11-Oct-2002	Jeff Roberson <jeff@FreeBSD.org>	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
# 5715307f	09-Oct-2002	John Baldwin <jhb@FreeBSD.org>	- Move p_cpulimit to struct proc from struct plimit and protect it with sched_lock. This means that we no longer access p_limit in mi_switch() and the p_limit pointer can be protected by the proc lock. - Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE processes don't call mi_switch(), and even if they did there is no longer the danger of p_limit being NULL (which is what the original zombie check was added for). - When we bump the current processes soft CPU limit in ast(), just bump the private p_cpulimit instead of the shared rlimit. This fixes an XXX for some value of fix. There is still a (probably benign) bug in that this code doesn't check that the new soft limit exceeds the hard limit. Inspired by: bde (2)
# f4cd8f9f	30-Sep-2002	John Baldwin <jhb@FreeBSD.org>	Change p_cpulimit to be in seconds instead of microseconds. Since p_runtime now is a bintime, it is no longer an optimization to store p_cpulimit as microseconds. Suggested by: phk
# 05ba50f5	21-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
# 4f0db5e0	15-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
# f824b518	23-Jul-2002	John Polstra <jdp@FreeBSD.org>	Widen struct sockbuf's sb_timeo member to int from short. With non-default but reasonable values of hz this member overflowed, breaking NFS over UDP. Also, as long as I'm plowing up struct sockbuf ... Change certain members from u_long/long to u_int/int in order to reduce wasted space on 64-bit machines. This change was requested by Andrew Gallatin. Netstat and systat need to be rebuilt. I am incrementing __FreeBSD_version in case any ports need to change.
# 01609114	28-Jun-2002	Alfred Perlstein <alfred@FreeBSD.org>	more caddr_t removal.
# f44d9e24	18-May-2002	John Baldwin <jhb@FreeBSD.org>	Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.
# ba626c1d	16-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Lock proctree_lock instead of pgrpsess_lock.
# bad56603	13-Apr-2002	John Baldwin <jhb@FreeBSD.org>	- Change donice() to take a thread as the first argument instead of a process so it can use td_ucred. - Require the target process of donice() to be locked when donice() is called. - Use td_ucred. - Lock the target process of p_cansee() and while reading the credentials of a process. - Change the logic of rtprio() slightly so it does it's copyin() if needed prior to locking the target process. - rtprio() no longer needs Giant. In theory with full KSE it would still need Giant to protect p_ucred of curproc for the p_canfoo() functions but p_canfoo() will be changing to using td_ucred of curthread before full KSE hits the tree.
# 6008862b	04-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 44731cab	01-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# c91f7a73	26-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Cast the variable, not the constant to 64 bits.
# f591779b	23-Feb-2002	Seigo Tanimura <tanimura@FreeBSD.org>	Lock struct pgrp, session and sigio. New locks are: - pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members. Please refer to sys/proc.h for the coverage of these locks. Changes on the pgrp/session interface: - pgfind() needs the pgrpsess_lock held. - The caller of enterpgrp() is responsible to allocate a new pgrp and session. - Call enterthispgrp() in order to enter an existing pgrp. - pgsignal() requires a pgrp lock held. Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
# 1cbb9c3b	22-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Convert p->p_runtime and PCPU(switchtime) to bintime format.
# 2c100766	11-Feb-2002	Julian Elischer <julian@FreeBSD.org>	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 547ce823	20-Jan-2002	Alfred Perlstein <alfred@FreeBSD.org>	use mutex pool mutexes for uidinfo locking. replace mutex_lock calls on uidinfo with macro calls: mtx_lock(&uidp->ui_mtx) -> UIDINFO_LOCK(uidp) Terry Lambert <tlambert2@mindspring.com> helped with this.
# 9aefe36f	04-Nov-2001	Peter Wemm <peter@FreeBSD.org>	* empty log message *
# fc5d29ef	01-Nov-2001	Robert Watson <rwatson@FreeBSD.org>	o Move suser() calls in kern/ to using suser_xxx() with an explicit credential selection, rather than reference via a thread or process pointer. This is part of a gradual migration to suser() accepting a struct ucred instead of a struct proc, simplifying the reference and locking semantics of suser(). Obtained from: TrustedBSD Project
# 0e9fe212	28-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Adjust printfs to be time_t agnostic.
# cbc89bfb	10-Oct-2001	Paul Saab <ps@FreeBSD.org>	Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader tunable. Reviewed by: peter MFC after: 2 weeks
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# e342cd27	01-Sep-2001	John Baldwin <jhb@FreeBSD.org>	Use sched_lock to protect rtp_to_pri() and pri_to_rtp() when needed.
# 835a82ee	01-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	Giant Pushdown. Saved the worst P4 tree breakage for last. reboot() getpriority() setpriority() rtprio() osetrlimit() ogetrlimit() setrlimit() getrlimit() getrusage() getpid() getppid() getpgrp() getpgid() getsid() getgid() getegid() getgroups() setsid() setpgid() setuid() seteuid() setgid() setegid() setgroups() setreuid() setregid() setresuid() setresgid() getresuid() getresgid () __setugid() getlogin() setlogin() modnext() modfnext() modstat() modfind() kldload() kldunload() kldfind() kldnext() kldstat() kldfirstmod() kldsym() getdtablesize() dup2() dup() fcntl() close() ofstat() fstat() nfsstat() fpathconf() flock()
# 8cfdf322	21-Jul-2001	Assar Westerlund <assar@FreeBSD.org>	add prototype for dosetrlimit
# a0f75161	05-Jul-2001	Robert Watson <rwatson@FreeBSD.org>	o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx(). The p_can(...) construct was a premature (and, it turns out, awkward) abstraction. The individual calls to p_canxxx() better reflect differences between the inter-process authorization checks, such as differing checks based on the type of signal. This has a side effect of improving code readability. o Replace direct credential authorization checks in ktrace() with invocation of p_candebug(), while maintaining the special case check of KTR_ROOT. This allows ktrace() to "play more nicely" with new mandatory access control schemes, as well as making its authorization checks consistent with other "debugging class" checks. o Eliminate "privused" construct for p_can*() calls which allowed the caller to determine if privilege was required for successful evaluation of the access control check. This primitive is currently unused, and as such, serves only to complicate the API. Approved by: ({procfs,linprocfs} changes) des Obtained from: TrustedBSD Project
# 0cddd8f0	04-Jul-2001	Matthew Dillon <dillon@FreeBSD.org>	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
# 23955314	18-May-2001	Alfred Perlstein <alfred@FreeBSD.org>	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# e6af1080	29-Apr-2001	Jake Burkholder <jake@FreeBSD.org>	Make rtprio work again. - add a missing break which caused RTP_SET to always return EINVAL - break instead of returning if p_can fails so proc_lock is always dropped correctly - only copyin data that is actually needed - use break instead of goto - make rtp_to_pri return EINVAL instead of -1 if the values are out or range so we don't have to translate
# 33a9ed9d	23-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning. Reviewed by: -smp, dfr, jake
# d34f8d30	12-Apr-2001	Robert Watson <rwatson@FreeBSD.org>	o Limit process information leakage by introducing a p_can(...P_CAN_SEE...) in rtprio()'s RTP_LOOKIP implementation. Obtained from: TrustedBSD Project
# 1005a129	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# f34fa851	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# 9708152c	09-Mar-2001	Alfred Perlstein <alfred@FreeBSD.org>	Don't call malloc with M_WAITOK while holding a mutex.
# 35030da9	22-Feb-2001	Tor Egge <tegge@FreeBSD.org>	Backout previous commit. sched_lock is held, thus interrupts are prevented here. Submitted by: jhb
# 0d139b37	22-Feb-2001	Tor Egge <tegge@FreeBSD.org>	Protect update of the per processor switchtime variable against interrupts. Protect usage of the per processor switchtime variable against interrupts in calcru(). This seem to eliminate the "microuptime() went backwards" warnings.
# d82b3e31	20-Feb-2001	Tor Egge <tegge@FreeBSD.org>	Ensure that RLIMIT_NPROC limits are at least 1 to avoid bad interaction with chgproccnt. MFC candiate. Reviewed by: alfred
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 7871c121	24-Jan-2001	John Baldwin <jhb@FreeBSD.org>	- Add a mtx_assert() for sched_lock in calcru(). - Protect calcru() with sched_lock later on in the file when it is called.
# ef73ae4b	09-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# c0c25570	12-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# c5a86b0a	30-Nov-2000	Alfred Perlstein <alfred@FreeBSD.org>	Translate alfred to english. Submitted by: bde
# 1baf4aab	30-Nov-2000	Alfred Perlstein <alfred@FreeBSD.org>	use a oppurtunistic locking strategy with the uidinfo structures to avoid locking the global hash on each uifree() make struct uidinfo only visible to the kernel make uihold() a function rather than a macro to reduce bloat swap the order of a spl/mutex to maintain consistancy
# 9c19bcdd	25-Nov-2000	Alfred Perlstein <alfred@FreeBSD.org>	Make uidinfo subsystem mpsafe use a mutex lock when looking up/deleting entries on the hashlist use a mutex lock on each uidinfo when updating fields make uifree() a void function rather than 'int' since no one cares allocate uidinfo structs with the M_ZERO flag and don't explicitly initialize them Assisted by: eivind, jhb, jakeb
# 553629eb	22-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# b429049a	18-Sep-2000	Paul Saab <ps@FreeBSD.org>	Add new line character to debugging printf's.
# 0384fff8	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# c5930ee4	06-Sep-2000	Don Lewis <truckman@FreeBSD.org>	Change the calls to panic() in uifree(), chgproccnt(), and chgsbsize() to printf(). Any errors detected are not likely to be fatal, so it should be safe to let things keep running.
# f535380c	05-Sep-2000	Don Lewis <truckman@FreeBSD.org>	Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().
# 387d2c03	29-Aug-2000	Robert Watson <rwatson@FreeBSD.org>	o Centralize inter-process access control, introducing: int p_can(p1, p2, operation, privused) which allows specification of subject process, object process, inter-process operation, and an optional call-by-reference privused flag, allowing the caller to determine if privilege was required for the call to succeed. This allows jail, kern.ps_showallprocs and regular credential-based interaction checks to occur in one block of code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL, and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a series of static function checks in kern_prot, which should not be invoked directly. o Commented out capabilities entries are included for some checks. o Update most inter-process authorization to make use of p_can() instead of manual checks, PRISON_CHECK(), P_TRESPASS(), and kern.ps_showallprocs. o Modify suser{,_xxx} to use const arguments, as it no longer modifies process flags due to the disabling of ASU. o Modify some checks/errors in procfs so that ENOENT is returned instead of ESRCH, further improving concealment of processes that should not be visible to other processes. Also introduce new access checks to improve hiding of processes for procfs_lookup(), procfs_getattr(), procfs_readdir(). Correct a bug reported by bp concerning not handling the CREATE case in procfs_lookup(). Remove volatile flag in procfs that caused apparently spurious qualifier warnigns (approved by bde). o Add comment noting that ktrace() has not been updated, as its access control checks are different from ptrace(), whereas they should probably be the same. Further discussion should happen on this topic. Reviewed by: bde, green, phk, freebsd-security, others Approved by: bde Obtained from: TrustedBSD Project
# 0a7d1711	23-Aug-2000	Brian Feldman <green@FreeBSD.org>	Revert the suser -> suser_xxx change made previously. It was right before.
# 9b969686	16-Aug-2000	Brian Feldman <green@FreeBSD.org>	Fix a couple cases where p_trespass wasn't transitioned into place. Make RTP_SET (rtprio) only accessible to real root, not root in jails.
# c27f4d3c	10-Jun-2000	Poul-Henning Kamp <phk@FreeBSD.org>	fix a typo
# 7cadc266	03-Jun-2000	Robert Watson <rwatson@FreeBSD.org>	o Modify jail to limit creation of sockets to UNIX domain sockets, TCP/IP (v4) sockets, and routing sockets. Previously, interaction with IPv6 was not well-defined, and might be inappropriate for some environments. Similarly, sysctl MIB entries providing interface information also give out only addresses from those protocol domains. For the time being, this functionality is enabled by default, and toggleable using the sysctl variable jail.socket_unixiproute_only. In the future, protocol domains will be able to determine whether or not they are ``jail aware''. o Further limitations on process use of getpriority() and setpriority() by jailed processes. Addresses problem described in kern/17878. Reviewed by: phk, jmg
# 693e27d4	15-Feb-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Don't try to account for the partial quantum unless the process is curproc. This only makes any difference on SMP, where we used a (potentially very bogus) switchtime from our own CPU to calculate resource usage on another CPU. This should remove some if not all calcru() related warnings on SMP. Approved by: jkh
# 8950d244	28-Jan-2000	Brian Feldman <green@FreeBSD.org>	Fix a bug that could crash the system if you press ^T while a slower system is slowed down and in the right spot (a race condition in fork()). The "previous time" fields have moved from pstat to proc. Anything which uses KVM needs to be recompiled with a new libkvm/headers. A couple wacky u_quad_t's in struct proc are now u_int64_t (the same, but according to lack of 'quad's in proc.h and usage in kern_resource.c). This will have no effect on code. This has been make-world-and-installed-new-kernel-which-works-fine-tested. Reviewed by: bde (previous version)
# 8f04f6c7	29-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Add a bit of sanity checking and problem avoidance in case the timecounter hardware is bogus. This will produce a new warning "microuptime() went backwards" and try to not screw up the process resource accounting.
# 2e3c8fcb	16-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This is a partial commit of the patch from PR 14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
# 923502ff	29-Oct-1999	Poul-Henning Kamp <phk@FreeBSD.org>	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
# d1f088da	11-Oct-1999	Peter Wemm <peter@FreeBSD.org>	Trim unused options (or #ifdef for undoc options). Submitted by: phk
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 75c13541	28-Apr-1999	Poul-Henning Kamp <phk@FreeBSD.org>	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
# 1c308b81	26-Apr-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Change suser_xxx() to suser() where it applies.
# f711d546	27-Apr-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
# 6fc8f347	13-Mar-1999	Bruce Evans <bde@FreeBSD.org>	Enforce monotonicity of apparent process user, system and interrupt times. PR: 975, 10402
# 56ce1a8d	11-Mar-1999	Bruce Evans <bde@FreeBSD.org>	Fixed runtime accounting. The time since the previous context switch was discarded on every call to calcru(). Hacking on the `switchtime' global for a related fix in rev.1.38 of kern_resource.c was too fragile and broke when p_switchtime went away. PR: 10402
# efc96764	05-Mar-1999	Bruce Evans <bde@FreeBSD.org>	The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu numbers as chars or use bogus casts in an attempt to unmisrepresnt them. In top, don't assume that 0xff is the only negative cpu number when cpu numbers are (mis)represented.
# e7ba67f2	28-Feb-1999	Bruce Evans <bde@FreeBSD.org>	Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu, not per-process. Keep it in `switchtime' consistently. It is now clear that the timestamp is always valid in fork_trampoline() except when the child is running on a previously idle cpu, which can only happen if there are multiple cpus, so don't check or set the timestamp in fork_trampoline except in the (i386) SMP case. Just remove the alpha code for setting it unconditionally, since there is no SMP case for alpha and the code had rotted. Parts reviewed by: dfr, phk
# 1b0b259e	25-Feb-1999	Bruce Evans <bde@FreeBSD.org>	Don't forget to update `switchticks' in corner cases (except for the alpha fork_trampoline(), forget it because it I believe it is only necessary for the unsupported SMP case).
# ba198b1c	30-Jan-1999	Mark Newton <newton@FreeBSD.org>	Added comments about non-staticization so it doesn't get un-done next time someone goes on a staticization binge. Suggested by: eivind
# 69a6f20b	29-Jan-1999	Mark Newton <newton@FreeBSD.org>	Unstaticized routines which are needed by the svr4 KLD and the streams garbage needed to support SysVR4 networking.
# bc9e7c3b	27-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Fixed double counting of runtime after a process exits. The last timeslice of the exiting process was counted for both the exiting process and the next process to run if the next process runs immediately. Broken in: mostly in kern_clock.c rev.1.70 (1998/05/28)
# e796e00d	28-May-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Some cleanups related to timecounters and weird ifdefs in <sys/time.h>. Clean up (or if antipodic: down) some of the msgbuf stuff. Use an inline function rather than a macro for timecounter delta. Maintain process "on-cpu" time as 64 bits of microseconds to avoid needless second rollover overhead. Avoid calling microuptime the second time in mi_switch() if we do not pass through _idle in cpu_switch() This should reduce our context-switch overhead a bit, in particular on pre-P5 and SMP systems. WARNING: Programs which muck about with struct proc in userland will have to be fixed. Reviewed, but found imperfect by: bde
# c21410e1	17-May-1998	Poul-Henning Kamp <phk@FreeBSD.org>	s/nanoruntime/nanouptime/g s/microruntime/microuptime/g Reviewed by: bde
# b90dcc0c	04-Apr-1998	Peter Wemm <peter@FreeBSD.org>	Fix previous commit. Don't people read compiler messages or something??
# 00af9731	04-Apr-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Time changes mark 2: * Figure out UTC relative to boottime. Four new functions provide time relative to boottime. * move "runtime" into struct proc. This helps fix the calcru() problem in SMP. * kill mono_time. * add timespec{add\|sub\|cmp} macros to time.h. (XXX: These may change!) * nanosleep, select & poll takes long sleeps one day at a time Reviewed by: bde Tested by: ache and others
# 644d85f4	04-Mar-1998	Peter Dufault <dufault@FreeBSD.org>	Reviewed by: msmith, bde long ago Fix for RTPRIO scheduler to eliminate invalid context switches. POSIX.4 headers and sysctl variables. Nothing should change unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
# 303b270b	08-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Staticize.
# 15406740	04-Feb-1998	David Greenman <dg@FreeBSD.org>	Restrict idleprio to superuser: Realtime priority has to be restricted for reasons which should be obvious. However, for idle priority, there is a potential for system deadlock if an idleprio process gains a lock on a resource that other processes need (and the idleprio process can't run due to a CPU-bound normal process). Fix me! XXX PR: 5639
# ffbb164e	18-Jan-1998	Bruce Evans <bde@FreeBSD.org>	Set p_retval for the correct process in getpriority(). This fixes a null pointer panic when the pointer for the incorrect process is NULL. getpriority() was broken in rev.1.27. Rev.1.28 broke the warning instead of fixing the problem. PR: 5495
# 5591b823d	16-Dec-1997	Eivind Eklund <eivind@FreeBSD.org>	Make COMPAT_43 and COMPAT_SUNOS new-style options.
# 4a11ca4e	07-Nov-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
# cb226aaa	06-Nov-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
# 1d9655ae	25-Aug-1997	Bruce Evans <bde@FreeBSD.org>	Print more info in the "calcru: negative time" message.
# 477a642c	26-Apr-1997	Peter Wemm <peter@FreeBSD.org>	Man the liferafts! Here comes the long awaited SMP -> -current merge! There are various options documented in i386/conf/LINT, there is more to come over the next few days. The kernel should run pretty much "as before" without the options to activate SMP mode. There are a handful of known "loose ends" that need to be fixed, but have been put off since the SMP kernel is in a moderately good condition at the moment. This commit is the result of the tinkering and testing over the last 14 months by many people. A special thanks to Steve Passe for implementing the APIC code!
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# e9822d92	22-Dec-1996	Joerg Wunsch <joerg@FreeBSD.org>	Make DFLDSIZ and MAXDSIZ fully-supported options. "Don't forget to do a ``make depend''" :-)
# 604396ff	08-Jun-1996	Bruce Evans <bde@FreeBSD.org>	Fixed accumulation of run time for processes that don't accumulate any statclock ticks. Pretend that all the time up to the first statclock tick is system time. . This makes a difference mainly for benchmarks that test short-lived processes - the user and system times for processes that each lived for about 1ms only added up to about 10% of the real time even when there was very little interrupt activity. Break the printing of a quad_t variable correctly.
# edbfedac	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]
# ec3f61ff	10-Mar-1996	Jeffrey Hsu <hsu@FreeBSD.org>	From Lite2: proc LIST changes stylistic changes to function prototypes Reviewed by: david & bde
# 505ae68e	16-Jan-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Fix a printf, well, actually break it, that is... We don't have the ability to print 64bit things yet...
# efeaf95a	06-Dec-1995	David Greenman <dg@FreeBSD.org>	Untangled the vm.h include file spaghetti.
# d2d3e875	11-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
# cf7e5eab	10-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Fixed types of rtprio(), osetrlimit() and setrlimit(). The args struct tag and/or member names conflicted with the machine generated ones in <sys/sysproto.h>.
# 8159bae9	23-Oct-1995	Bruce Evans <bde@FreeBSD.org>	Fix a sign extension bug that was unleashed by the previous change. The total process time was sometimes 2^32 usec too large but that wasn't a problem before because the time was bogusly truncated mod 2^32.
# bcee717b	21-Oct-1995	Bruce Evans <bde@FreeBSD.org>	Avoid overflow in calcru(). Fixes PR 788. Submitted by: imdave@synet.net (Dave Bodenstab)
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# e6373c9e	20-Feb-1995	Guido van Rooij <guido@FreeBSD.org>	Implement maxprocperuid and maxfilesperproc. They are tunable via sysctl(8). The initial value of maxprocperuid is maxproc-1, that of maxfilesperproc is maxfiles (untill maxfile will disappear) Now it is at least possible to prohibit one user opening maxfiles -Guido Submitted by: Obtained from:
# a81b757a	06-Dec-1994	Bruce Evans <bde@FreeBSD.org>	Don't allow negative limits at all. Convert them to RLIM_INFINITY instead of returning EINVAL since something may depend on them being broken. Allowing negative limits caused bugs almost everywhere. The recent fixes for MAXSSIZ checked the limits too late to stop anyone defeating limits set by root...
# 06935b59	02-Dec-1994	Andreas Schulz <ats@FreeBSD.org>	Add one forgotten u_quad_t typecast in dosetrlimit.
# a7d72265	01-Dec-1994	Andreas Schulz <ats@FreeBSD.org>	The values for setrlimit in the data size and stack size case are used as an address value. Then all comparisons should be done unsigned and not signed. Fix it with a typecast of u_quad_t. Error can be demonstrated with the current bash in port, do a ulimit -s unlimited and the machine hangs. bash delivers through an internal error a large negative value for the stacksize, the comparison saw this smaller than MAXSSIZ and then tried to expand the stack to this size.
# d93f860c	09-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	Cosmetics. related to getting prototypes into view.
# 7216391e	01-Oct-1994	David Greenman <dg@FreeBSD.org>	"idle priority" support. Based on code from Henrik Vestergaard Draboel, but substantially rewritten by me.
# bb56ec4a	25-Sep-1994	Poul-Henning Kamp <phk@FreeBSD.org>	While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
# e8fb0b2c	31-Aug-1994	David Greenman <dg@FreeBSD.org>	Realtime priority scheduling support. Submitted by: Henrik Vestergaard Draboel
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources