History log of /freebsd-current/sys/kern/kern_resource.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 6419ed7e 06-Jul-2023 Doug Moore <dougm@FreeBSD.org>

inline_fls: drop compile-time check

HAVE_INLINE_FLSLL is #defined always. This change assumes that where
__HAVE_INLINE_FLSLL is tested, the two leading underscores are a
mistake, and that the code will be better for using the efficient
flsll implementation.
Reviewed by: markj, mhorne
Differential Revision: https://reviews.freebsd.org/D40705


# bbe62559 19-Aug-2022 Mateusz Guzik <mjg@FreeBSD.org>

rlimit: line up with other clean up in thread_reap_domain

NFC


# f2eb09b0 26-Jul-2022 Dimitry Andric <dim@FreeBSD.org>

Adjust function definitions in kern_resource.c to avoid clang 15 warnings

With clang 15, the following -Werror warnings are produced:

sys/kern/kern_resource.c:1212:10: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
lim_alloc()
^
void
sys/kern/kern_resource.c:1365:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
uihashinit()
^
void

This is because lim_alloc() and uihashinit() are declared with (void)
argument lists, but defined with empty argument lists. Make the
definitions match the declarations.

MFC after: 3 days


# becaf643 14-Feb-2022 John Baldwin <jhb@FreeBSD.org>

Use vmspace->vm_stacktop in place of sv_usrstack in more places.

Reviewed by: markj
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D34174


# 93288e24 01-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

Employ thread_cow_synced in setrlimit

In order to avoid proc lock/unlock on next kernel entry.


# 8a0cb04d 01-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

Add lim_cowsync, similar to crcowsync


# 5a8413e7 17-Jan-2022 Mark Johnston <markj@FreeBSD.org>

setrlimit: Remove special handling for RLIMIT_STACK with a stack gap

This will not be required with a forthcoming reimplementation of ASLR
stack randomization. Moreover, this change was not sufficient to enable
the use of a stack size limit smaller than the stack gap itself.

PR: 260303
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704


# bacb140f 14-Jan-2022 Simon J. Gerraty <sjg@FreeBSD.org>

Ignore calcru: runtime went backwards for vm_guest

VM's have little control over CPU speed, don't make matters worse
by constantly spaming console.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33902


# a9545eed 09-Dec-2021 Florian Walpen <dev@submerge.ch>

Add idle priority scheduling privilege group to MAC/priority

Add an idletime user group that allows non-root users to run processes
with idle scheduling priority. Privileges are granted by a MAC policy in
the mac_priority module. For this purpose, the kernel privilege
PRIV_SCHED_IDPRIO was added to sys/priv.h (kernel module ABI change).

Deprecate the system wide sysctl(8) knob
security.bsd.unprivileged_idprio which lets any user run idle priority
processes, regardless of context. While the knob is still working, it is
marked as deprecated in the description and in the man pages.

MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D33338


# a20a2450 09-Dec-2021 Florian Walpen <dev@submerge.ch>

Add PRIV_SCHED_IDPRIO

The privilege allows the holder to assign idle priority type to thread
or process.

MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D33338


# 638c5fa8 29-Nov-2021 Brooks Davis <brooks@FreeBSD.org>

syscalls: normalize (get|set)rlimit

Declare normal <foo>_args structs rather than going out of the way
to declare __<foo>_args.

Reviewed by: kib, imp


# 889b56c8 13-Oct-2021 Dawid Gorecki <dgr@semihalf.com>

setrlimit: Take stack gap into account.

Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516


# af29f399 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Split umtx.h on two counterparts.

To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.

While here fix umtx_key_match style.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31248
MFC after: 2 weeks


# 0659df6f 12-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

vm_map_protect: allow to set prot and max_prot in one go.

This prevents a situation where other thread modifies map entries
permissions between setting max_prot, then relocking, then setting prot,
confusing the operation outcome. E.g. you can get an error that is not
possible if operation is performed atomic.

Also enable setting rwx for max_prot even if map does not allow to set
effective rwx protection.

Reviewed by: brooks, markj (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28117


# fb8ab680 14-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

thread: batch resource limit free calls


# 5c5ca843 10-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

Allow rtprio_thread to operate on threads of any process

This in particular unbreaks rtkit.

The limitation was a leftover of previous state, to quote a
comment:

/*
* Though lwpid is unique, only current process is supported
* since there is no efficient way to look up a LWP yet.
*/

Long since then a global tid hash was introduced to remedy
the problem.

Permission checks still apply.

Submitted by: greg_unrelenting.technology (Greg V)
Differential Revision: https://reviews.freebsd.org/D27158


# 6fed89b1 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

kern: clean up empty lines in .c and .h files


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# ca603bb1 12-Jan-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

dd kern_getpriority(), make Linuxulator use it.

Reviewed by: kib, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22842


# 7a0ef283 12-Jan-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_setpriority(), use it in Linuxulator.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22841


# 61a74c5c 15-Dec-2019 Jeff Roberson <jeff@FreeBSD.org>

schedlock 1/4

Eliminate recursion from most thread_lock consumers. Return from
sched_add() without the thread_lock held. This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks. This will eventually allow for lockless remote adds.

Discussed with: kib
Reviewed by: jhb
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22626


# f5526339 17-Mar-2019 Andrew Gallatin <gallatin@FreeBSD.org>

Fix a typo introduced in r344133

The line was misedited to change tt to st instead of
changing ut to st.

The use of st as the denominator in mul64_by_fraction() will lead
to an integer divide fault in the intr proc (the process holding
ithreads) where st will be 0. This divide by 0 happens after
the total runtime for all ithreads exceeds 76 hours.

Submitted by: bde


# 23e5e43c 14-Feb-2019 Bruce Evans <bde@FreeBSD.org>

Finish the fix for overflow in calcru1().

The previous fix was unnecessarily very slow up to 105 hours where the
simple formula used previously worked, and unnecessarily slow by a factor
of about 5/3 up to 388 days, and didn't work above 388 days. 388 days is
not a long time, since it is a reasonable uptime, and for processes the
times being calculated are aggregated over all threads, so with N CPUs
running the same thread a runtime of 388 days is reachable after only
388 / N physical days.

The PRs document overflow at 388 days, but don't try to fix it.

Use the simple formula up to 76 hours. Then use a complicated general
method that reduces to the simple formula up to a bit less than 105
hours, then reduces to the previous method without its extra work up
to almost 388 days, then does more complicated reductions, usually
many bits at a time so that this is not slow. This works up to half
of maximum representable time (292271 years), with accumulated rounding
errors of at most 32 usec.

amd64 can do all this with no avoidable rounding errors in an inline
asm with 2 instructions, but this is too special to use. __uint128_t
can do the same with 100's of instructions on 64-bit arches. Long
doubles with at least 64 bits of precision are the easiest method to
use on i386 userland, but are hard to use in the kernel.

PR: 76972 and duplicates
Reviewed by: kib


# e0d164c7 10-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

Prevent overflow for usertime/systime in caclru1

PR: 76972 and duplicates
Reported by: Dr. Christopher Landauer <cal AT aero.org>,
Steinar Haug <sthaug AT nethelp.no>
Submitted by: Andrey Zonov <andrey AT zonov.org> (earlier version)
MFC after: 2 weeks


# 73e62bc9 10-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Make lim_cur inline if possible.

It is a function call only to accomodate *some* ABIs which install a hook.
They only care for 3 types of limits: DATA, STACK, VMEM

Instead of always calling the func, see at compilation time if the requested
limit is something else and just do the read if so.

Sponsored by: The FreeBSD Foundation


# 6ff4688b 07-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last

Sponsored by: The FreeBSD Foundation


# e8bb589d 04-Oct-2018 Matt Macy <mmacy@FreeBSD.org>

eliminate locking surrounding ui_vmsize and swap reserve by using atomics

Change swap_reserve and swap_total to be in units of pages so that
swap reservations can be done using only atomics instead of using a single
global mutex for swap_reserve and a single mutex for all processes running
under the same uid for uid accounting.

Results in mmap speed up and a 70% increase in brk calls / second.

Reviewed by: alc@, markj@, kib@
Approved by: re (delphij@)
Differential Revision: https://reviews.freebsd.org/D16273


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 2da93c21 04-Jan-2018 John Baldwin <jhb@FreeBSD.org>

Always use atomic_fetchadd() when updating per-user accounting values.

This avoids re-reading a variable after it has been updated via an
atomic op. It is just a cosmetic cleanup as the read value was only
used to control a diagnostic printf that should rarely occur (if ever).

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D13768


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 5949c7e5 31-Oct-2017 Mateusz Guzik <mjg@FreeBSD.org>

Save on uihash table locking by checking if the caller already uses the struct

In particular with poudriere this saves about 90% of lookups.


# 3e85b721 16-May-2017 Ed Maste <emaste@FreeBSD.org>

Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# 1bdbd705 28-Feb-2016 Konstantin Belousov <kib@FreeBSD.org>

Implement process-shared locks support for libthr.so.3, without
breaking the ABI. Special value is stored in the lock pointer to
indicate shared lock, and offline page in the shared memory is
allocated to store the actual lock.

Reviewed by: vangyzen (previous version)
Discussed with: deischen, emaste, jhb, rwatson,
Martin Simmons <martin@lispworks.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 20b4c1d2 22-Dec-2015 Enji Cooper <ngie@FreeBSD.org>

Fold lim_shared into lim_copy to mute a -Wunused compiler warning from
clang when the kernel is compiled without INVARIANTS

Differential Revision: https://reviews.freebsd.org/D4683
Reviewed by: kib, jhb
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division


# 15db3c07 14-Nov-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Speed up rctl operation with large rulesets, by holding the lock
during iteration instead of relocking it for each traversed rule.

Reviewed by: mjg@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D4110


# cd672ca6 16-Jul-2015 Mateusz Guzik <mjg@FreeBSD.org>

Get rid of lim_update_thread and cred_update_thread.

Their primary use was in thread_cow_update to free up old resources.
Freeing had to be done with proc lock held and _cow_ funcs already knew
how to free old structs.


# 7150ce74 24-Jun-2015 Mateusz Guzik <mjg@FreeBSD.org>

rlimit: deduplicate code in chg* functions


# f6f6d240 10-Jun-2015 Mateusz Guzik <mjg@FreeBSD.org>

Implement lockless resource limits.

Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.


# 316b3843 15-Apr-2015 Konstantin Belousov <kib@FreeBSD.org>

Implement support for binary to requesting specific stack size for the
initial thread. It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.

The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 5c7bebf9 26-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

The process spin lock currently has the following distinct uses:

- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# dff9862c 23-Nov-2014 Mateusz Guzik <mjg@FreeBSD.org>

ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct

Change racct_ create and destroy to macros evaluating to nothing without RACCT
so that their callers passing ui_racct don't have to be ifdefed.


# 8d866b10 27-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Tidy up functions related to uidinfo management.

- reference found uidinfo in uilookup
- reduce nesting by handling shorter cases first


# 8101958c 27-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

De-k&r-ify function definitions in kern/kern_resource.c

No functional changes.


# 675c3507 24-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

rlimit: plug duplicate assertion

counter sanity is already checked by refcount_release.


# c2a48c0d 13-Dec-2013 Mateusz Guzik <mjg@FreeBSD.org>

rlimit: avoid unnecessary copying of rlimits

If refcount is 1 just modify rlimits in place.

MFC after: 2 weeks


# 3318a9c8 13-Dec-2013 Mateusz Guzik <mjg@FreeBSD.org>

rlimit: add and utilize lim_shared

MFC after: 2 weeks


# 9110db81 21-Oct-2013 Konstantin Belousov <kib@FreeBSD.org>

Add a resource limit for the total number of kqueues available to the
user. Kqueue now saves the ucred of the allocating thread, to
correctly decrement the counter on close.

Under some specific and not real-world use scenario for kqueue, it is
possible for the kqueues to consume memory proportional to the square
of the number of the filedescriptors available to the process. Limit
allows administrator to prevent the abuse.

This is kernel-mode side of the change, with the user-mode enabling
commit following.

Reported and tested by: pho
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 9a2bff7c 06-Mar-2013 Ian Lepore <ian@FreeBSD.org>

Call sched_prio() to immediately change the priority of the thread in
response to an rtprio_thread() call, when the priority is different
than the old priority, and either the old or the new priority class is
not RTP_PRIO_NORMAL (timeshare).

The reasoning for the second half of the test is that if it's a change in
timeshare priority, then the scheduler is going to adjust that priority
in a way that completely wipes out the requested change anyway, so
what's the point? (If that's not true, then allowing a thread to change
its own timeshare priority would subvert the scheduler's adjustments and
let a cpu-bound thread monopolize the cpu; if allowed at all, that
should require priveleges.)

On the other hand, if either the old or new priority class is not
timeshare, then the scheduler doesn't make automatic adjustments, so we
should honor the request and make the priority change right away. The
reason the old class gets caught up in this is the very reason for this
change: when thread A changes the priority of its child thread B from
idle back to timeshare, thread B never actually gets moved to a
timeshare-range run queue unless there are some idle cycles available
to allow it to first get scheduled again as an idle thread.

Reviewed by: jhb@


# 4601bab1 04-Mar-2013 Davide Italiano <davide@FreeBSD.org>

MFcalloutng (r244251 with minor changes):
Specify that precision of 0.5s is enough for resource limitation.

Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil


# 8854fe39 22-Jan-2012 Mikolaj Golub <trociny@FreeBSD.org>

Change kern.proc.rlimit sysctl to:

- retrive only one, specified limit for a process, not the whole
array, as it was previously (the sysctl has been added recently and
has not been backported to stable yet, so this change is ok);

- allow to set a resource limit for another process.

Submitted by: Andrey Zonov <andrey at zonov.org>
Discussed with: kib
Reviewed by: kib
MFC after: 2 weeks


# 948c4609 05-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Fix a logic bug in change 228207 in the check for a thread's new user
priority being a realtime priority.

MFC after: 3 days


# 9910b854 13-Dec-2011 Eitan Adler <eadler@FreeBSD.org>

- Add a sysctl to allow non-root users the ability to set idle
priorities.

- While here fix up some style nits.

Discussed with: cperciva (breifly)
Reviewed by: pjd (earlier version)
Reviewed by: bde
Approved by: jhb
MFC after: 1 month


# 593dd43e 02-Dec-2011 John Baldwin <jhb@FreeBSD.org>

When changing the user priority of a thread, change the real priority
in addition to the user priority for threads whose current real priority
is equal to the previous user priority or if the new priority is a
real-time priority. This allows priority changes of other threads to
have an immediate effect.

MFC after: 2 weeks


# bde886fb 07-Nov-2011 Mikolaj Golub <trociny@FreeBSD.org>

In lim_fork() assert that processes locks are held.

Suggested by: kib


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 2417d97e 18-Jul-2011 John Baldwin <jhb@FreeBSD.org>

- Export each thread's individual resource usage in in struct kinfo_proc's
ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the
process sysctls.
- Correctly account for the current thread's cputime in the thread when
doing the runtime fixup in calcru().
- Use TIDs as the key to lookup the previous thread to compute IO stat
deltas in IO mode in top when thread display is enabled.

Reviewed by: kib
Approved by: re (kib)


# e806d352 06-Apr-2011 John Baldwin <jhb@FreeBSD.org>

Fix several places to ignore processes that are not yet fully constructed.

MFC after: 1 week


# 097055e2 29-Mar-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code. It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 8e6fa660 24-Mar-2011 John Baldwin <jhb@FreeBSD.org>

Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after: 1 week


# c8e368a9 29-Dec-2010 David Xu <davidxu@FreeBSD.org>

- Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
lowered, repropagete priorities, this may cause mutex owner's priority
to be lowerd, in old code, mutex owner's priority is rise-only.


# 36b4cd24 17-Dec-2010 John Baldwin <jhb@FreeBSD.org>

Add back a bounds check on valid idle priorities that was lost in an
earlier commit. While here, move the thread lock down in rtp_to_pri().
It is not needed for all of the priority value checks and the computation
of newpri.

Reported by: swell.k @ gmail
MFC after: 3 days


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# c4965cfc 18-Oct-2010 Ed Maste <emaste@FreeBSD.org>

We've already set p = td->td_proc, so use it.


# cf7d9a8c 08-Oct-2010 David Xu <davidxu@FreeBSD.org>

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# 1a996ed1 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by: cperciva@


# 805cc58a 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.


# eea4ac8b 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove outdated comment and move part of it into more applicable place.


# 60ae52f7 21-Jun-2010 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# f3e1e28b 26-May-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r208488:
Fix the double counting of the last process thread td_incruntime
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.

Approved by: re (kensmith)


# 41fd9c63 24-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Fix the double counting of the last process thread td_incruntime
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.

Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().

Diagnosed and tested by: neel
MFC after: 3 days


# c193de56 11-May-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r207468:
Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility
function ruxagg_tlock().
Convert the definition of kern_getrusage() to ANSI C.

MFC r207602:
Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks
information for thread to allow calcru1() (re)use.

Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1].
The ruxagg_locked() function no longer clears thread ticks nor
td_incruntime.

Not an MFC: the td_rux is added to the end of struct thread to keep
the KBI. Explicit bzero() of td_rux is added to new thread initialization
points.


# bed4c524 03-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks
information for thread to allow calcru1() (re)use.

Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1].
The ruxagg_locked() function no longer clears thread ticks nor
td_incruntime.

Requested by: attilio [1]
Discussed with: attilio, bde
Reviewed by: bde
Based on submission by: Alexander Krizhanovsky <ak natsys-lab com>
MFC after: 1 week
X-MFC-Note: td_rux shall be moved to the end of struct thread


# 3087dc40 01-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility
function ruxagg_tlock().
Convert the definition of kern_getrusage() to ANSI C.

Submitted by: Alexander Krizhanovsky <ak natsys-lab com>
MFC after: 1 week


# 9b355dc7 05-Apr-2010 Randall Stewart <rrs@FreeBSD.org>

MFC of 204670:
-------------------------
sched_getparam was just plain broke for time-share
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).

sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.
-----------------------


# bec67fd3 03-Mar-2010 Randall Stewart <rrs@FreeBSD.org>

sched_getparam was just plain broke for time-share
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).

sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.

MFC after: 3 weeks


# 3364c323 23-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 300fa5ef 23-Oct-2008 David Xu <davidxu@FreeBSD.org>

Don't rearm callout if the process is exiting, it may leak a callout
because callout_drain() only waits for running callout, but not disable
it if it is rearmed.


# 1ede983c 23-Oct-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# c27991e8 05-Sep-2008 Ed Schouten <ed@FreeBSD.org>

Fix a small typo in a comment in calcru1().

The word "happene" should read "happened".

Submitted by: Jille Timmermans <jille quis cx>


# bc093719 20-Aug-2008 Ed Schouten <ed@FreeBSD.org>

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


# 4682cd0b 19-Mar-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove extra uihold() call that accidentally sneak in during perforce
change @125544.


# 374ae2a3 19-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.


# 709446e7 16-Mar-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Whitespace cleanups.


# 1b072fbc 16-Mar-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Use wait-free method to manage ui_sbsize and ui_proccnt fields in the
uidinfo structure. This entirely removes contention observed on the
ui_mtxp mutex (as it is now gone).
- Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just
need to read-lock it.

Reviewed by: jhb, jeff, kris & others
Tested by: kris


# e0567707 16-Mar-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Style fixes.


# 67e83b07 16-Mar-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix information leak. We can find PIDs of running processes from within
a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and
checking the return value: 0 means that the process exists and -1 that
it doesn't exist.

Reviewed by: rwatson
MFC after: 1 week


# 6617724c 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# d92909c1 10-Jan-2008 Robert Watson <rwatson@FreeBSD.org>

Don't zero td_runtime when billing thread CPU usage to the process;
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.

When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.

This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.

There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time. Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.

Reviewed by: attilio, emaste, jhb, julian


# 435806d3 11-Dec-2007 David Xu <davidxu@FreeBSD.org>

Fix LOR of thread lock and umtx's priority propagation mutex due
to the reworking of scheduler lock.

MFC: after 3 days


# fb62eea2 16-Jul-2007 Jeff Roberson <jeff@FreeBSD.org>

- Use ruxagg() in calcru() to make sure we have current tick information
from all threads.

Discussed with: bde, attilio
Approved by: re


# 59d8f3ff 12-Jul-2007 John Baldwin <jhb@FreeBSD.org>

Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by: re (kensmith)


# 7e273744 14-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Remove the restriction that rtprio(2) cannot be used to set the realtime
or idle priority of another process owned by the same user. This means
that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly
via p_cansched(9) or directly to set realtime/idle privilege, rather than
directly affecting target process authorization.


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# a1fe14bc 09-Jun-2007 Attilio Rao <attilio@FreeBSD.org>

rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)


# a140976e 09-Jun-2007 Attilio Rao <attilio@FreeBSD.org>

The current rusage code show peculiar problems:
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations

This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.

A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.

Submitted by: jeff
Approved by: jeff (mentor)


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 1c4bcd05 31-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- Move rusage from being per-process in struct pstats to per-thread in
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.

Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)


# e1e8f51b 27-May-2007 Robert Watson <rwatson@FreeBSD.org>

Universally adopt most conventional spelling of acquire.


# 19059a13 14-May-2007 John Baldwin <jhb@FreeBSD.org>

Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after: 1 week


# 873fbcd7 05-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 0c14ff0e 04-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 1ad9ee86 25-Feb-2007 Xin LI <delphij@FreeBSD.org>

Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.

In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx. Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.

The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process. By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value. Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area. On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.

This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock. This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized. In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.

Other potential solutions are still under evaluation.

Discussed with: davidxu, jhb, rwatson
PR: kern/108071
MFC after: 2 weeks


# 86138fc7 19-Feb-2007 Robert Watson <rwatson@FreeBSD.org>

Use priv_check(9) instead of suser(9) for checking the privilege to
set real-time priority on a thread. It looks like this suser(9)
call was introduced after my first pass through replacing superuser
checks with named privilege checks.


# 4f506694 17-Jan-2007 Xin LI <delphij@FreeBSD.org>

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# ad1e7d28 05-Dec-2006 Julian Elischer <julian@FreeBSD.org>

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


# fa0d3a32 19-Nov-2006 David Xu <davidxu@FreeBSD.org>

Use scheduler API sched_user_prio() to adjust thread's userland priority,
use td_base_user_prio to get real userland priority since POSIX priority
mutex may adjust td_user_pri which is an effective priority.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 8460a577 26-Oct-2006 John Birrell <jb@FreeBSD.org>

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 73fa3e5b 20-Sep-2006 David Xu <davidxu@FreeBSD.org>

Replace system call thr_getscheduler, thr_setscheduler, thr_setschedparam
with rtprio_thread, while rtprio system call is for process only, the new
system call rtprio_thread is responsible for LWP.


# 776fc0e9 04-Aug-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


# 272601f8 11-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Go over calcru and friends once more.

Reintroduce the monotonicity for the normal case and make the two
special cases behave in what is belived to be the most sensible fasion.


# 0f038c05 09-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Add slop to "backwards" cpu accounting messages, 3 usec or 1% whichever
triggers.

This should eliminate all the trivial messages which result from minor
increases in cpu_tick frequency.

Machines which don't du cpu clock fiddling shouldn't issue "backwards"
messages now.

Laptops and other machines where the initial estimate of cputicks may be
waaaay off will still issue warnings.


# 8f95fc24 22-Feb-2006 John Baldwin <jhb@FreeBSD.org>

Various style and comment fixes.

Submitted by: bde


# 6fc6433e 21-Feb-2006 John Baldwin <jhb@FreeBSD.org>

Split calcru() back into a calcru1() function shared with calccru() and
a calcru() wrapper that passes a local rusage_ext on the stack that is
a snapshot to do the calculations on. Now we can pass p->p_crux to
calcru1() in calccru() again which fixes the issues with runtime going
backwards messages when dead processes are harvested by init.

Reviewed by: phk
Tested by: Stefan Ehmann shoesoft at gmx dot net


# e8444a7e 11-Feb-2006 Poul-Henning Kamp <phk@FreeBSD.org>

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


# 5b1a8eb3 07-Feb-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


# 6807424d 24-Jan-2006 Stephan Uphoff <ups@FreeBSD.org>

Back out changes made in rev. 1.151.
They were bogus.

Cluebat applied by: jhb@


# 03001f59 23-Jan-2006 Stephan Uphoff <ups@FreeBSD.org>

Hopefully fix the "calcru: runtime went backwards from ..." problem by
keeping the resource values locked (where needed) while we use them
for calculations.

MFC after: 3 days


# 1471f287 02-Nov-2005 Paul Saab <ps@FreeBSD.org>

Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by: jhb


# b2149bde 27-Sep-2005 John Baldwin <jhb@FreeBSD.org>

Use the reference count API to manage the reference counts for process
limit structures rather than using pool mutexes to protect the reference
counts.

Tested on: i386, alpha, sparc64


# f0e51320 01-Jun-2005 Alan Cox <alc@FreeBSD.org>

Giant is no longer required in kern_setrlimit(); remove its acquisition and
release.

Reviewed by: jhb


# 63710c4d 30-Dec-2004 John Baldwin <jhb@FreeBSD.org>

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


# 78c85e8d 05-Oct-2004 John Baldwin <jhb@FreeBSD.org>

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


# 6111dcd2 23-Sep-2004 John Baldwin <jhb@FreeBSD.org>

A modest collection of various and sundry style, spelling, and whitespace
fixes.

Submitted by: bde (mostly)


# 7eaec467 22-Sep-2004 John Baldwin <jhb@FreeBSD.org>

Various small style fixes.


# 5dd3a4ed 06-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not
needed if we print the local variable version of the limit rather
than the shared version.


# 1b93405c 04-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

Remove spl's from kern_resource.c.


# 56f21b9d 26-Jul-2004 Colin Percival <cperciva@FreeBSD.org>

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# ba39a1c5 21-Jun-2004 Bruce Evans <bde@FreeBSD.org>

Turned off the "calcru: negative time" warning for certain SMP cases
where it is known to detect a problem but the problem is not very easy
to fix. The warning became very common recently after a call to calcru()
was added to fill_kinfo_thread().

Another (much older) cause of "negative times" (actually non-monotonic
times) was fixed in rev.1.237 of kern_exit.c.

Print separate messages for non-monotonic and negative times.


# fa885116 15-Jun-2004 Julian Elischer <julian@FreeBSD.org>

Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.


# 1930e303 11-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


# 60f798c1 08-May-2004 Julian Elischer <julian@FreeBSD.org>

Fix rtprio() to do sensible things when called from threaded processes.
It's not quite correct from a posix Point Of view, but it is a lot better
than what was there before. This will be revisited later
when we decide what form our priority extensions will take. Posix doesn't
specify how a system scope thread can change its priority so you need to
add non-standard extensions to be able to do it..
For now make this slightly non standard to allow it to be done.

Submitted by: Dan Eischen originally, changed by myself.


# 4ddd1e65 10-Apr-2004 Maxime Henrion <mux@FreeBSD.org>

Remove a comment that complains about the lack of %qd, to justify
truncating a rlim_t to a long. We have %qd since some time now.
However, the correct format to use here is %jd and a cast to
intmax_t, so do this.


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# e7a44cac 11-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Argh! Fix a bogon. lim_cur() was returning the hard (max) limit rather
than the soft (cur) limit.

Submitted by: bde


# a875f385 06-Feb-2004 John Baldwin <jhb@FreeBSD.org>

- Convert the plimit lock to a pool mutex lock.
- Hide struct plimit from userland.

Submitted by: bde (2)


# f4daf056 06-Feb-2004 John Baldwin <jhb@FreeBSD.org>

- Correct the translation of old rlimit values to properly handle the old
RLIM_INFINITY case for ogetrlimit().
- Use %jd and intmax_t to output negative time in usec in calcru().
- Rework getrusage() to make a copy of the rusage struct into a local
variable while holding Giant and then do the copyout from the local
variable to avoid having to have the original process rusage struct
locked while doing the copyout (which would not be safe). This also
includes a few style fixes from Bruce to getrusage().

Submitted by: bde (1, parts of 3)
Suggested by: bde (2)


# 99b6e02b 06-Feb-2004 John Baldwin <jhb@FreeBSD.org>

A few more style fixes from Bruce including a few I missed last time.

Submitted by: bde


# b4323d77 05-Feb-2004 John Baldwin <jhb@FreeBSD.org>

- A lot of style and whitespace fixes.
- Update a few comments regarding locking notes.

Submitted by: bde (1, mostly)


# 91d5354a 04-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# eab9cabf 27-Oct-2003 Jeff Roberson <jeff@FreeBSD.org>

- Don't set td_priority directly here, use sched_prio().


# 857d9c60 12-Jul-2003 Don Lewis <truckman@FreeBSD.org>

Extend the mutex pool implementation to permit the creation and use of
multiple mutex pools with different options and sizes. Mutex pools can
be created with either the default sleep mutexes or with spin mutexes.
A dynamically created mutex pool can now be destroyed if it is no longer
needed.

Create two pools by default, one that matches the existing pool that
uses the MTX_NOWITNESS option that should be used for building higher
level locks, and a new pool with witness checking enabled.

Modify the users of the existing mutex pool to use the appropriate pool
in the new implementation.

Reviewed by: jhb


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 4d923fe3 23-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Remove Giant from [gs]etpriority().


# a15cc359 22-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Lock both the proc lock and sched_lock when calling sched_nice since
kg_nice is now protected by both. Being protected by both means that
other places in the kernel that want to read kg_nice only need one of the
two locks.


# 08865ba1 18-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Add a couple of sched_lock asserts.


# f6f230fe 10-Apr-2003 Jeff Roberson <jeff@FreeBSD.org>

- Adjust sched hooks for fork and exec to take processes as arguments instead
of ksegs since they primarily operation on processes.
- KSEs take ticks so pass the kse through sched_clock().
- Add a sched_class() routine that adjusts a ksegrp pri class.
- Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
that will be used to tell the scheduler about new instances of these
structures within the same process. These will be used by THR and KSE.
- Change sched_4bsd to reflect this API update.


# 262c27b8 12-Mar-2003 Tim J. Robbins <tjr@FreeBSD.org>

Back out previous. The locking here needs a rethink.


# a7cbe87a 12-Mar-2003 Tim J. Robbins <tjr@FreeBSD.org>

Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses
to kg_nice and calls to sched_nice() in getpriority() and setpriority()
(really donice()).


# 27e39ae4 19-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

Remove the PL_SHAREMOD flag from struct plimit, which could have been
used to share resource limits between rfork threads, but never was.
Removing it makes resource limit locking much simpler -- only the current
process can change the contents of the structure that p_limit points to.


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# e4625663 16-Feb-2003 Jeff Roberson <jeff@FreeBSD.org>

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


# 5ce623b8 13-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

Add an XXX comment noting that getrusage() accesses p_stats->p_ru
and p_stats->p_cru without holding the appropriate locks.


# 6f8132a8 31-Jan-2003 Julian Elischer <julian@FreeBSD.org>

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 0dbb100b 26-Jan-2003 David Xu <davidxu@FreeBSD.org>

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# b43179fb 11-Oct-2002 Jeff Roberson <jeff@FreeBSD.org>

- Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c

Reviewed by: -arch


# 5715307f 09-Oct-2002 John Baldwin <jhb@FreeBSD.org>

- Move p_cpulimit to struct proc from struct plimit and protect it with
sched_lock. This means that we no longer access p_limit in mi_switch()
and the p_limit pointer can be protected by the proc lock.
- Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE
processes don't call mi_switch(), and even if they did there is no longer
the danger of p_limit being NULL (which is what the original zombie check
was added for).
- When we bump the current processes soft CPU limit in ast(), just bump the
private p_cpulimit instead of the shared rlimit. This fixes an XXX for
some value of fix. There is still a (probably benign) bug in that this
code doesn't check that the new soft limit exceeds the hard limit.

Inspired by: bde (2)


# f4cd8f9f 30-Sep-2002 John Baldwin <jhb@FreeBSD.org>

Change p_cpulimit to be in seconds instead of microseconds. Since
p_runtime now is a bintime, it is no longer an optimization to store
p_cpulimit as microseconds.

Suggested by: phk


# 05ba50f5 21-Sep-2002 Jake Burkholder <jake@FreeBSD.org>

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


# 4f0db5e0 15-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


# f824b518 23-Jul-2002 John Polstra <jdp@FreeBSD.org>

Widen struct sockbuf's sb_timeo member to int from short. With
non-default but reasonable values of hz this member overflowed,
breaking NFS over UDP.

Also, as long as I'm plowing up struct sockbuf ... Change certain
members from u_long/long to u_int/int in order to reduce wasted
space on 64-bit machines. This change was requested by Andrew
Gallatin.

Netstat and systat need to be rebuilt. I am incrementing
__FreeBSD_version in case any ports need to change.


# 01609114 28-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

more caddr_t removal.


# f44d9e24 18-May-2002 John Baldwin <jhb@FreeBSD.org>

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


# ba626c1d 16-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Lock proctree_lock instead of pgrpsess_lock.


# bad56603 13-Apr-2002 John Baldwin <jhb@FreeBSD.org>

- Change donice() to take a thread as the first argument instead of a
process so it can use td_ucred.
- Require the target process of donice() to be locked when donice() is
called.
- Use td_ucred.
- Lock the target process of p_cansee() and while reading the credentials
of a process.
- Change the logic of rtprio() slightly so it does it's copyin() if needed
prior to locking the target process.
- rtprio() no longer needs Giant. In theory with full KSE it would still
need Giant to protect p_ucred of curproc for the p_canfoo() functions
but p_canfoo() will be changing to using td_ucred of curthread before
full KSE hits the tree.


# 6008862b 04-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# c91f7a73 26-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Cast the variable, not the constant to 64 bits.


# f591779b 23-Feb-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


# 1cbb9c3b 22-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Convert p->p_runtime and PCPU(switchtime) to bintime format.


# 2c100766 11-Feb-2002 Julian Elischer <julian@FreeBSD.org>

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 547ce823 20-Jan-2002 Alfred Perlstein <alfred@FreeBSD.org>

use mutex pool mutexes for uidinfo locking.
replace mutex_lock calls on uidinfo with macro calls:
mtx_lock(&uidp->ui_mtx) -> UIDINFO_LOCK(uidp)

Terry Lambert <tlambert2@mindspring.com> helped with this.


# 9aefe36f 04-Nov-2001 Peter Wemm <peter@FreeBSD.org>

*** empty log message ***


# fc5d29ef 01-Nov-2001 Robert Watson <rwatson@FreeBSD.org>

o Move suser() calls in kern/ to using suser_xxx() with an explicit
credential selection, rather than reference via a thread or process
pointer. This is part of a gradual migration to suser() accepting
a struct ucred instead of a struct proc, simplifying the reference
and locking semantics of suser().

Obtained from: TrustedBSD Project


# 0e9fe212 28-Oct-2001 Matthew Dillon <dillon@FreeBSD.org>

Adjust printfs to be time_t agnostic.


# cbc89bfb 10-Oct-2001 Paul Saab <ps@FreeBSD.org>

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# e342cd27 01-Sep-2001 John Baldwin <jhb@FreeBSD.org>

Use sched_lock to protect rtp_to_pri() and pri_to_rtp() when needed.


# 835a82ee 01-Sep-2001 Matthew Dillon <dillon@FreeBSD.org>

Giant Pushdown. Saved the worst P4 tree breakage for last.

reboot() getpriority() setpriority() rtprio() osetrlimit() ogetrlimit()
setrlimit() getrlimit() getrusage() getpid() getppid() getpgrp()
getpgid() getsid() getgid() getegid() getgroups() setsid() setpgid()
setuid() seteuid() setgid() setegid() setgroups() setreuid() setregid()
setresuid() setresgid() getresuid() getresgid () __setugid() getlogin()
setlogin() modnext() modfnext() modstat() modfind() kldload() kldunload()
kldfind() kldnext() kldstat() kldfirstmod() kldsym() getdtablesize()
dup2() dup() fcntl() close() ofstat() fstat() nfsstat() fpathconf()
flock()


# 8cfdf322 21-Jul-2001 Assar Westerlund <assar@FreeBSD.org>

add prototype for dosetrlimit


# a0f75161 05-Jul-2001 Robert Watson <rwatson@FreeBSD.org>

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


# 0cddd8f0 04-Jul-2001 Matthew Dillon <dillon@FreeBSD.org>

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 23955314 18-May-2001 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# e6af1080 29-Apr-2001 Jake Burkholder <jake@FreeBSD.org>

Make rtprio work again.
- add a missing break which caused RTP_SET to always return EINVAL
- break instead of returning if p_can fails so proc_lock is always
dropped correctly
- only copyin data that is actually needed
- use break instead of goto
- make rtp_to_pri return EINVAL instead of -1 if the values are out
or range so we don't have to translate


# 33a9ed9d 23-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# d34f8d30 12-Apr-2001 Robert Watson <rwatson@FreeBSD.org>

o Limit process information leakage by introducing a p_can(...P_CAN_SEE...)
in rtprio()'s RTP_LOOKIP implementation.

Obtained from: TrustedBSD Project


# 1005a129 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# f34fa851 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


# 9708152c 09-Mar-2001 Alfred Perlstein <alfred@FreeBSD.org>

Don't call malloc with M_WAITOK while holding a mutex.


# 35030da9 22-Feb-2001 Tor Egge <tegge@FreeBSD.org>

Backout previous commit. sched_lock is held, thus interrupts are prevented
here.

Submitted by: jhb


# 0d139b37 22-Feb-2001 Tor Egge <tegge@FreeBSD.org>

Protect update of the per processor switchtime variable against
interrupts.

Protect usage of the per processor switchtime variable against
interrupts in calcru().

This seem to eliminate the "microuptime() went backwards" warnings.


# d82b3e31 20-Feb-2001 Tor Egge <tegge@FreeBSD.org>

Ensure that RLIMIT_NPROC limits are at least 1 to avoid bad interaction
with chgproccnt. MFC candiate.

Reviewed by: alfred


# d5a08a60 11-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 7871c121 24-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- Add a mtx_assert() for sched_lock in calcru().
- Protect calcru() with sched_lock later on in the file when it is called.


# ef73ae4b 09-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


# c0c25570 12-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# c5a86b0a 30-Nov-2000 Alfred Perlstein <alfred@FreeBSD.org>

Translate alfred to english.

Submitted by: bde


# 1baf4aab 30-Nov-2000 Alfred Perlstein <alfred@FreeBSD.org>

use a oppurtunistic locking strategy with the uidinfo structures to avoid
locking the global hash on each uifree()

make struct uidinfo only visible to the kernel

make uihold() a function rather than a macro to reduce bloat

swap the order of a spl/mutex to maintain consistancy


# 9c19bcdd 25-Nov-2000 Alfred Perlstein <alfred@FreeBSD.org>

Make uidinfo subsystem mpsafe
use a mutex lock when looking up/deleting entries on the hashlist
use a mutex lock on each uidinfo when updating fields

make uifree() a void function rather than 'int' since no one cares

allocate uidinfo structs with the M_ZERO flag and don't explicitly initialize
them

Assisted by: eivind, jhb, jakeb


# 553629eb 22-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# b429049a 18-Sep-2000 Paul Saab <ps@FreeBSD.org>

Add new line character to debugging printf's.


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# c5930ee4 06-Sep-2000 Don Lewis <truckman@FreeBSD.org>

Change the calls to panic() in uifree(), chgproccnt(), and chgsbsize()
to printf(). Any errors detected are not likely to be fatal, so it
should be safe to let things keep running.


# f535380c 05-Sep-2000 Don Lewis <truckman@FreeBSD.org>

Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize(). Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent. Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access. Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid. Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c. Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().


# 387d2c03 29-Aug-2000 Robert Watson <rwatson@FreeBSD.org>

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


# 0a7d1711 23-Aug-2000 Brian Feldman <green@FreeBSD.org>

Revert the suser -> suser_xxx change made previously. It was right
before.


# 9b969686 16-Aug-2000 Brian Feldman <green@FreeBSD.org>

Fix a couple cases where p_trespass wasn't transitioned into place.

Make RTP_SET (rtprio) only accessible to real root, not root in jails.


# c27f4d3c 10-Jun-2000 Poul-Henning Kamp <phk@FreeBSD.org>

fix a typo


# 7cadc266 03-Jun-2000 Robert Watson <rwatson@FreeBSD.org>

o Modify jail to limit creation of sockets to UNIX domain sockets,
TCP/IP (v4) sockets, and routing sockets. Previously, interaction
with IPv6 was not well-defined, and might be inappropriate for some
environments. Similarly, sysctl MIB entries providing interface
information also give out only addresses from those protocol domains.

For the time being, this functionality is enabled by default, and
toggleable using the sysctl variable jail.socket_unixiproute_only.
In the future, protocol domains will be able to determine whether or
not they are ``jail aware''.

o Further limitations on process use of getpriority() and setpriority()
by jailed processes. Addresses problem described in kern/17878.

Reviewed by: phk, jmg


# 693e27d4 15-Feb-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Don't try to account for the partial quantum unless the process is
curproc. This only makes any difference on SMP, where we used a
(potentially very bogus) switchtime from our own CPU to calculate
resource usage on another CPU.

This should remove some if not all calcru() related warnings on SMP.

Approved by: jkh


# 8950d244 28-Jan-2000 Brian Feldman <green@FreeBSD.org>

Fix a bug that could crash the system if you press ^T while a slower
system is slowed down and in the right spot (a race condition in fork()).

The "previous time" fields have moved from pstat to proc. Anything which
uses KVM needs to be recompiled with a new libkvm/headers.

A couple wacky u_quad_t's in struct proc are now u_int64_t (the same, but
according to lack of 'quad's in proc.h and usage in kern_resource.c).
This will have no effect on code.

This has been make-world-and-installed-new-kernel-which-works-fine-tested.

Reviewed by: bde (previous version)


# 8f04f6c7 29-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Add a bit of sanity checking and problem avoidance in case the
timecounter hardware is bogus.

This will produce a new warning "microuptime() went backwards"
and try to not screw up the process resource accounting.


# 2e3c8fcb 16-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This is a partial commit of the patch from PR 14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# 923502ff 29-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# d1f088da 11-Oct-1999 Peter Wemm <peter@FreeBSD.org>

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 1c308b81 26-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Change suser_xxx() to suser() where it applies.


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 6fc8f347 13-Mar-1999 Bruce Evans <bde@FreeBSD.org>

Enforce monotonicity of apparent process user, system and interrupt times.

PR: 975, 10402


# 56ce1a8d 11-Mar-1999 Bruce Evans <bde@FreeBSD.org>

Fixed runtime accounting. The time since the previous context switch
was discarded on every call to calcru(). Hacking on the `switchtime'
global for a related fix in rev.1.38 of kern_resource.c was too fragile
and broke when p_switchtime went away.

PR: 10402


# efc96764 05-Mar-1999 Bruce Evans <bde@FreeBSD.org>

The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.


# e7ba67f2 28-Feb-1999 Bruce Evans <bde@FreeBSD.org>

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


# 1b0b259e 25-Feb-1999 Bruce Evans <bde@FreeBSD.org>

Don't forget to update `switchticks' in corner cases (except for
the alpha fork_trampoline(), forget it because it I believe it is
only necessary for the unsupported SMP case).


# ba198b1c 30-Jan-1999 Mark Newton <newton@FreeBSD.org>

Added comments about non-staticization so it doesn't get un-done next
time someone goes on a staticization binge.

Suggested by: eivind


# 69a6f20b 29-Jan-1999 Mark Newton <newton@FreeBSD.org>

Unstaticized routines which are needed by the svr4 KLD and the streams
garbage needed to support SysVR4 networking.


# bc9e7c3b 27-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Fixed double counting of runtime after a process exits. The last
timeslice of the exiting process was counted for both the exiting
process and the next process to run if the next process runs
immediately.

Broken in: mostly in kern_clock.c rev.1.70 (1998/05/28)


# e796e00d 28-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


# c21410e1 17-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


# b90dcc0c 04-Apr-1998 Peter Wemm <peter@FreeBSD.org>

Fix previous commit. Don't people read compiler messages or something??


# 00af9731 04-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


# 644d85f4 04-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.

POSIX.4 headers and sysctl variables. Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.


# 303b270b 08-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Staticize.


# 15406740 04-Feb-1998 David Greenman <dg@FreeBSD.org>

Restrict idleprio to superuser:

Realtime priority has to be restricted for reasons which should be
obvious. However, for idle priority, there is a potential for
system deadlock if an idleprio process gains a lock on a resource
that other processes need (and the idleprio process can't run
due to a CPU-bound normal process). Fix me! XXX
PR: 5639


# ffbb164e 18-Jan-1998 Bruce Evans <bde@FreeBSD.org>

Set p_retval for the correct process in getpriority(). This fixes
a null pointer panic when the pointer for the incorrect process is
NULL. getpriority() was broken in rev.1.27. Rev.1.28 broke the
warning instead of fixing the problem.

PR: 5495


# 5591b823d 16-Dec-1997 Eivind Eklund <eivind@FreeBSD.org>

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 4a11ca4e 07-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 1d9655ae 25-Aug-1997 Bruce Evans <bde@FreeBSD.org>

Print more info in the "calcru: negative time" message.


# 477a642c 26-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# e9822d92 22-Dec-1996 Joerg Wunsch <joerg@FreeBSD.org>

Make DFLDSIZ and MAXDSIZ fully-supported options.

"Don't forget to do a ``make depend''" :-)


# 604396ff 08-Jun-1996 Bruce Evans <bde@FreeBSD.org>

Fixed accumulation of run time for processes that don't accumulate
any statclock ticks. Pretend that all the time up to the first
statclock tick is system time. . This makes a difference mainly for
benchmarks that test short-lived processes - the user and system
times for processes that each lived for about 1ms only added up to
about 10% of the real time even when there was very little interrupt
activity.

Break the printing of a quad_t variable correctly.


# edbfedac 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]


# ec3f61ff 10-Mar-1996 Jeffrey Hsu <hsu@FreeBSD.org>

From Lite2: proc LIST changes
stylistic changes to function prototypes
Reviewed by: david & bde


# 505ae68e 16-Jan-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a printf, well, actually break it, that is...
We don't have the ability to print 64bit things yet...


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# d2d3e875 11-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# cf7e5eab 10-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Fixed types of rtprio(), osetrlimit() and setrlimit(). The args struct
tag and/or member names conflicted with the machine generated ones in
<sys/sysproto.h>.


# 8159bae9 23-Oct-1995 Bruce Evans <bde@FreeBSD.org>

Fix a sign extension bug that was unleashed by the previous change.
The total process time was sometimes 2^32 usec too large but that
wasn't a problem before because the time was bogusly truncated mod
2^32.


# bcee717b 21-Oct-1995 Bruce Evans <bde@FreeBSD.org>

Avoid overflow in calcru(). Fixes PR 788.

Submitted by: imdave@synet.net (Dave Bodenstab)


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# e6373c9e 20-Feb-1995 Guido van Rooij <guido@FreeBSD.org>

Implement maxprocperuid and maxfilesperproc. They are tunable
via sysctl(8). The initial value of maxprocperuid is maxproc-1,
that of maxfilesperproc is maxfiles (untill maxfile will disappear)

Now it is at least possible to prohibit one user opening maxfiles

-Guido

Submitted by:
Obtained from:


# a81b757a 06-Dec-1994 Bruce Evans <bde@FreeBSD.org>

Don't allow negative limits at all. Convert them to RLIM_INFINITY instead
of returning EINVAL since something may depend on them being broken.
Allowing negative limits caused bugs almost everywhere. The recent
fixes for MAXSSIZ checked the limits too late to stop anyone defeating
limits set by root...


# 06935b59 02-Dec-1994 Andreas Schulz <ats@FreeBSD.org>

Add one forgotten u_quad_t typecast in dosetrlimit.


# a7d72265 01-Dec-1994 Andreas Schulz <ats@FreeBSD.org>

The values for setrlimit in the data size and stack size case are
used as an address value. Then all comparisons should be done unsigned
and not signed. Fix it with a typecast of u_quad_t.
Error can be demonstrated with the current bash in port, do a
ulimit -s unlimited and the machine hangs. bash delivers through
an internal error a large negative value for the stacksize, the
comparison saw this smaller than MAXSSIZ and then tried to expand
the stack to this size.


# d93f860c 09-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Cosmetics. related to getting prototypes into view.


# 7216391e 01-Oct-1994 David Greenman <dg@FreeBSD.org>

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


# bb56ec4a 25-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


# e8fb0b2c 31-Aug-1994 David Greenman <dg@FreeBSD.org>

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources