History log of /freebsd-9.3-release/sys/kern/kern_ktrace.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 267015 03-Jun-2014 delphij

Fix ktrace kernel memory disclosure. [SA-14:12]

Fix incorrect error handling in PAM policy parser. [SA-14:13]

Approved by: re (glebius)


# 237719 28-Jun-2012 jhb

MFC 234494:
Include the associated wait channel message for context switch ktrace
records. kdump supports both the old and new messages.


# 237663 27-Jun-2012 jhb

MFC 233925,236357:
Add new ktrace records for the start and end of VM faults. This gives
a pair of records similar to syscall entry and return that a user can
use to determine how long page faults take. The new ktrace records are
enabled via the 'p' trace type, but are not enabled in the default set of
trace points.


# 233353 23-Mar-2012 kib

MFC r231949:
Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

MFC r232493:
Remove unneeded cast to u_int. The values as small enough to fit into
int, beside the use of MIN macro which performs type promotions.

MFC r232494:
Instead of incomplete handling of read(2)/write(2) return values that
does not fit into registers, declare that we do not support this case
using CTASSERT(), and remove endianess-unsafe code to split return value
into td_retval.

While there, change the style of the sysctl debug.iosize_max_clamp
definition.

MFC r232495:
pipe_read(): change the type of size to int, and remove signed clamp.
pipe_write(): change the type of desiredsize back to int, its value fits.


# 230160 15-Jan-2012 eadler

MFC r228343:
- Fix ktrace leakage if error is set

PR: kern/163098
Approved by: sbruno


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 220390 06-Apr-2011 jhb

Fix several places to ignore processes that are not yet fully constructed.

MFC after: 1 week


# 219312 05-Mar-2011 dchagin

Style(9) fix.
Fix indentation in comment, double ';' in variable declaration.

MFC after: 1 Week


# 219311 05-Mar-2011 dchagin

Partially reworked r219042.
The reason for this is a bug at ktrops() where process dereferenced
without having a lock. This might cause a panic if ktrace was runned
with -p flag and the specified process exited between the dropping
a lock and writing sv_flags.

Since it is impossible to acquire sx lock while holding mtx switch
to use asynchronous enqueuerequest() instead of writerequest().

Rename ktr_getrequest_ne() to more understandable name [1].

Requested by: jhb [1]

MFC after: 1 Week


# 219042 25-Feb-2011 dchagin

Introduce preliminary support of the show description of the ABI of
traced process by adding two new events which records value of process
sv_flags to the trace file at process creation/execing/exiting time.

MFC after: 1 Month.


# 219041 25-Feb-2011 dchagin

ktrace_resize_pool() locking slightly reworked:
1) do not take a lock around the single atomic operation.
2) do not lose the invariant of lock by dropping/acquiring
ktrace_mtx around free() or malloc().

MFC after: 1 Month.


# 219028 25-Feb-2011 netchild

Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: arch@ (parts by rwatson, trasz, jhb)
X-MFC after: to be determined in last commit with code from this project


# 214158 21-Oct-2010 jhb

- When disabling ktracing on a process, free any pending requests that
may be left. This fixes a memory leak that can occur when tracing is
disabled on a process via disabling tracing of a specific file (or if
an I/O error occurs with the tracefile) if the process's next system
call is exit(). The trace disabling code clears p_traceflag, so exit1()
doesn't do any KTRACE-related cleanup leading to the leak. I chose to
make the free'ing of pending records synchronous rather than patching
exit1().
- Move KTRACE-specific logic out of kern_(exec|exit|fork).c and into
kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a
result.

MFC after: 1 month


# 211512 19-Aug-2010 jhb

Fix a whitespace nit and remove a questioning comment. STAILQ_CONCAT()
does require the STAILQ the existing list is being added to to already
be initialized (it is CONCAT() vs MOVE()).


# 211439 17-Aug-2010 jhb

Keep the process locked when calling ktrops() or ktrsetchildren() instead
of dropping the lock only to immediately reacquire it.


# 211102 09-Aug-2010 gavin

Add descriptions to a handful of sysctl nodes.

PR: kern/148580
Submitted by: Galimov Albert <wtfcrap mail.ru>
MFC after: 1 week


# 210064 14-Jul-2010 jhb

- Document layout of KTR_STRUCT payload in a comment.
- Simplify ktrstruct() calling convention by having ktrstruct() use
strlen() rather than requiring the caller to hand-code the length of
constant strings.

MFC after: 1 month


# 198411 23-Oct-2009 jhb

- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and
td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that
created shadow copies of these arrays were just using MAXCOMLEN.
- Prefer using sizeof() of an array type to explicit constants for the
array length in a few places.
- Ensure that all of p_comm[] and td_name[] is always zero'd during
execve() to guard against any possible information leaks. Previously
trailing garbage in p_comm[] could be leaked to userland in ktrace
record headers via td_name[].

Reviewed by: bde


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 190888 10-Apr-2009 rwatson

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# 189707 11-Mar-2009 jhb

Add a new type of KTRACE record for sysctl(3) invocations. It uses the
internal sysctl_sysctl_name() handler to map the MIB array to a string
name and logs this name in the trace log. This can be useful to see
exactly which sysctls a thread is invoking.

MFC after: 1 month


# 185583 03-Dec-2008 bz

Fix a credential reference leak. [1]

Close subtle but relatively unlikely race conditions when
propagating the vnode write error to other active sessions
tracing to the same vnode, without holding a reference on
the vnode anymore. [2]

PR: kern/126368 [1]
Submitted by: rwatson [2]
Reviewed by: kib, rwatson
MFC after: 4 weeks


# 176471 22-Feb-2008 des

This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payload
consists of the null-terminated name and the contents of any structure
you wish to record. A new ktrstruct() function constructs and emits a
KTR_STRUCT record. It is accompanied by convenience macros for struct
stat and struct sockaddr.

In kdump(1), KTR_STRUCT records are handled by a dispatcher function
that runs stringent sanity checks on its contents before handing it
over to individual decoding funtions for each type of structure.
Currently supported structures are struct stat and struct sockaddr for
the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK
and AF_IPX is present but disabled, as I am unable to test it properly.

Since 's' was already taken, the letter 't' is used by ktrace(1) to
enable KTR_STRUCT trace points, and in kdump(1) to enable their
decoding.

Derived from patches by Andrew Li <andrew2.li@citi.com>.

PR: kern/117836
MFC after: 3 weeks


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 173601 14-Nov-2007 julian

A bunch more files that should probably print out a thread name
instead of a process name.


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 172011 29-Aug-2007 jhb

Partially revert the previous change. I failed to notice that where
ktruserret() is invoked, an unlocked check of the per-process queue
is performed inline, thus, we don't lock the ktrace_sx on every userret().

Pointy hat to: jhb
Approved by: re (kensmith)
Pointy hat recovered from: rwatson


# 170685 13-Jun-2007 jhb

Improve the ktrace locking somewhat to reduce overhead:
- Depessimize userret() in kernels where KTRACE is enabled by doing an
unlocked check of the per-process queue of pending events before
acquiring any locks. Previously ktr_userret() unconditionally acquired
the global ktrace_sx lock on every return to userland for every thread,
even if ktrace wasn't enabled for the thread.
- Optimize the locking in exit() to first perform an unlocked read of
p_traceflag to see if ktrace is enabled and only acquire locks and
teardown ktrace if the test succeeds. Also, explicitly disable tracing
before draining any pending events so the pending events actually get
written out. The unlocked read is safe because proc lock is acquired
earlier after single-threading so p_traceflag can't change between then
and this check (well, it can currently due to a bug in ktrace I will fix
next, but that race existed prior to this change as well).

Reviewed by: rwatson


# 170587 11-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 170152 31-May-2007 kib

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 167211 04-Mar-2007 rwatson

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 166678 12-Feb-2007 mpp

Do not do a vn_close for all references to the ktraced file if we are
doing a CLEARFILE option. Do a vrele instead. This prevents
a panic later due to v_writecount being negative when the vnode
is taken off the freelist.

Submitted by: jhb


# 166073 17-Jan-2007 delphij

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# 165293 16-Dec-2006 kmacy

ktrace_cv is no longer used - remove
Submitted by: Attilio Rao


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 160858 31-Jul-2006 jhb

Trim an obsolete comment. ktrgenio() stopped doing crazy gymnastics when
ktrace was redone to be mostly synchronous again.


# 159974 27-Jun-2006 pjd

Use suser_cred(9) instead of checking cr_uid directly.

Reviewed by: rwatson


# 157233 28-Mar-2006 jhb

- Conditionalize Giant around VFS operations for ALQ, ktrace, and
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
in coredump().

Reported by: jmg (2)


# 155031 30-Jan-2006 jeff

- Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant).

Sponsored by: Isilon Systems, Inc.


# 154739 23-Jan-2006 jhb

Fix a vnode reference leak in the ktrace code. We always grab a reference
to the vnode at the start of ktr_writerequest() but were missing the
corresponding vrele() after we finished the write operation.

Reported by: jasone


# 152430 14-Nov-2005 rwatson

In ktr_getrequest(), acquire ktrace_mtx earlier -- while the race
currently present is minor and offers no real semantic issues, it also
doesn't make sense since an earlier lockless check has already
occurred. Also hold the mutex longer, over a manipulation of
per-process ktrace state, which requires synchronization.

MFC after: 1 month
Pointed out by: jhb


# 152376 13-Nov-2005 rwatson

Moderate rewrite of kernel ktrace code to attempt to generally improve
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.

These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler). Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.

- Eliminate the ktrace worker thread and global record queue, as they
are no longer used. Keep the global free record list, as records
are still used.

- Add a per-process record queue, which will hold any asynchronously
generated records, such as from context switches. This replaces the
global queue as the place to submit asynchronous records to.

- When a record is committed asynchronously, simply queue it to the
process.

- When a record is committed synchronously, first drain any pending
per-process records in order to maintain ordering as best we can.
Currently ordering between competing threads is provided via a global
ktrace_sx, but a per-process flag or lock may be desirable in the
future.

- When a process returns to user space following a system call, trap,
signal delivery, etc, flush any pending records.

- When a process exits, flush any pending records.

- Assert on process tear-down that there are no pending records.

- Slightly abstract the notion of being "in ktrace", which is used to
prevent the recursive generation of records, as well as generating
traces for ktrace events.

Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on. I.e., performing a
drain every (n) records.

MFC after: 1 month
Discussed with: jhb
Requested by: Marc Olzheim <marcolz at stack dot nl>


# 151929 01-Nov-2005 rwatson

Reuse ktr_unused field in ktr_header structure as ktr_tid; populate
ktr_tid as part of gathering of ktr header data for new ktrace
records. The continued use of intptr_t is required for file layout
reasons, and cannot be changed to lwpid_t at this point.

MFC after: 1 month
Reviewed by: davidxu


# 151927 01-Nov-2005 rwatson

Replace ktr_buffer pointer in struct ktr_header with a ktr_unused
intptr_t. The buffer length needs to be written to disk as part
of the trace log, but the kernel pointer for the buffer does not.
Add a new ktr_buffer pointer to the kernel-only ktrace request
structure to hold that pointer. This frees up an integer in the
ktrace record format that can be used to hold the threadid,
although older ktrace files will have a garbage ktr_buffer field
(or more accurately, a kernel pointer value).

MFC after: 2 weeks
Space requested by: davidxu


# 147576 24-Jun-2005 pjd

Close another information leak in ktrace(2): one was able to find active
process groups outside a jail, etc. by using ktrace(2).

OK'ed by: rwatson
Approved by: re (scottl)
MFC after: 1 week


# 147520 21-Jun-2005 pjd

Add missing unlock.

Pointy hat to: pjd
Approved by: re (dwhite)


# 147183 09-Jun-2005 pjd

Remove process information leak from inside a jail, when
security.bsd.see_other_uids is set to 0, etc.
One can check if invisible process is active, by doing:

# ktrace -p <pid>

If ktrace returns 'Operation not permitted' the process is alive and
if returns 'No such process' there is no such process.

MFC after: 1 week


# 141633 10-Feb-2005 phk

Make a SYSCTL_NODE static


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# 131897 10-Jul-2004 phk

Clean up and wash struct iovec and struct uio handling.

Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy. Caller frees.

Use them throughout.


# 127911 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 126295 26-Feb-2004 jhb

Replace the ktrace queue's semaphore with a condition variable instead as
it is slightly more efficient since we already have a mutex to protect the
queue. Ktrace originally used a semaphore more as a proof of concept.


# 124802 21-Jan-2004 rwatson

Reduce gratuitous includes: don't include jail.h if it's not needed.
Presumably, at some point, you had to include jail.h if you included
proc.h, but that is no longer required.

Result of: self injury involving adding something to struct prison


# 122478 11-Nov-2003 jkoshy

Bound the number of iterations a thread can perform inside
ktr_resize_pool(); this eliminates a potential livelock.

Return ENOSPC only if we encountered an out-of-memory condition when
trying to increase the pool size.

Reviewed by: jhb, bde (style)


# 122457 11-Nov-2003 jkoshy

Have utrace(2) return ENOMEM if malloc() fails. Document this error
return in its manual page.

Reviewed by: jhb


# 118607 07-Aug-2003 jhb

Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort. In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by: bde (kern_ktrace.c)


# 118599 07-Aug-2003 jhb

The ktrace mutex does not need to be locked around the post of the ktrace
semaphore and doing so can lead to a possible reversal. WITNESS would have
caught this if semaphores were used more often in the kernel.

Submitted by: Ted Unangst <tedu@stanford.edu>, Dawson Engler


# 118094 27-Jul-2003 phk

Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 116101 09-Jun-2003 jhb

- Add a td_pflags field to struct thread for private flags accessed only by
curthread. Unlike td_flags, this field does not need any locking.
- Replace the td_inktr and td_inktrace variables with equivalent private
thread flags.
- Move TDF_OLDMASK over to the private flags field so it no longer requires
sched_lock.


# 114026 25-Apr-2003 jhb

- Push down Giant around vnode operations in ktrace().
- Mark the ktrace() and utrace() syscalls as being MP safe.
- Validate the facs argument to ktrace() prior to doing any vnode
operations or acquiring any locks.
- Share lock the proctree lock over the entire section that calls
ktrsetchildren() and ktrops(). We already did this for process groups.
Doing it for the process case closes a small race where a process might
go away after we look it up. As a result of this, ktrstchildren() now
just asserts that the proctree lock is locked rather than acquiring the
lock itself.
- Add some missing comments to #else and #endif.


# 112199 13-Mar-2003 jhb

Add a new userland-visible ktrace flag KTR_DROP and an internal ktrace flag
KTRFAC_DROP to track instances when ktrace events are dropped due to the
request pool being exhausted. When a thread tries to post a ktrace event
and is unable to due to no available ktrace request objects, it sets
KTRFAC_DROP in its process' p_traceflag field. The next trace event to
successfully post from that process will set the KTR_DROP flag in the
header of the request going out and clear KTRFAC_DROP.

The KTR_DROP flag is the high bit in the type field of the ktr_header
structure. Older kdump binaries will simply complain about an unknown type
when seeing an entry with KTR_DROP set. Note that KTR_DROP being set on a
record in a ktrace file does not tell you anything except that at least one
event from this process was dropped prior to this event. The user has no
way of knowing what types of events were dropped nor how many were dropped.

Requested by: phk


# 112198 13-Mar-2003 jhb

- Cache a reference to the credential of the thread that starts a ktrace in
struct proc as p_tracecred alongside the current cache of the vnode in
p_tracep. This credential is then used for all later ktrace operations on
this file rather than using the credential of the current thread at the
time of each ktrace event.
- Now that we have multiple ktrace-related items in struct proc that are
pointers, rename p_tracep to p_tracevp to make it less ambiguous.

Requested by: rwatson (1)


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


# 104230 30-Sep-2002 phk

Plug memory leaks.

Detected by: FlexeLint
Approved by: jhb


# 103237 11-Sep-2002 jhb

- Change utrace ktrace events to malloc the work buffer before getting a
request structure.
- Re-optimize the case of utrace being disabled by doing an explicit
KTRPOINT check instead of relying on the one in ktr_getrequest() so that
we don't waste time on a malloc in the non-tracing case.
- Change utrace() to return an error if the copyin() fails. Before it
would just ignore the request but still return success. This last is
a change in behavior and can be backed out if necessary.


# 103236 11-Sep-2002 jhb

Remove support for synchronous ktrace requests now that none exist anymore.
They were an ugly, gross hack.


# 103235 11-Sep-2002 jhb

- Change ktrace genio events to only copy up to ktr_geniosize bytes of a
transfer to a malloc'd buffer and use that bufer for the ktrace event.
This means that genio ktrace events no longer need to be synchronous.
- Now that ktr_buffer isn't overloaded to sometimes point to a cached uio
pointer for genio requests and always points to a malloc'd buffer if not
NULL, free the buffer in ktr_freerequest() instead of in
ktr_writerequest(). This closes a memory leak for ktrace events that
used a malloc'd buffer that had their vnode ripped out from under them
while they were on the todo list.

Suggested by: bde (1, in principle)


# 103234 11-Sep-2002 jhb

- Add a kern.ktrace sysctl node.
- Rename kern.ktrace_request_pool tunable/sysctl to
kern.ktrace.request_pool.
- Add a variable to control the max amount of data to log for genio events.
This variable is tunable via the tunable/sysctl kern.ktrace.genio_size
and defaults to one page.


# 103233 11-Sep-2002 jhb

Change namei and syscall ktrace events to malloc work buffers before
obtaining a ktr_request structure from the free pool so we can avoid
starving other threads of ktr_request structures.


# 102129 19-Aug-2002 rwatson

Pass active_cred and file_cred into the MAC framework explicitly
for mac_check_vnode_{poll,read,stat,write}(). Pass in fp->f_cred
when calling these checks with a struct file available. Otherwise,
pass NOCRED. All currently MAC policies use active_cred, but
could now offer the cached credential semantic used for the base
system security model.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 102112 19-Aug-2002 rwatson

Break out mac_check_vnode_op() into three seperate checks:
mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write().
This improves the consistency with other existing vnode checks, and
allows policies to avoid implementing switch statements to determine
what operations they do and do not want to authorize.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101153 01-Aug-2002 jhb

If we fail to write to a vnode during a ktrace write, then we drop all
other references to that vnode as a trace vnode in other processes as well
as in any pending requests on the todo list. Thus, it is possible for a
ktrace request structure to have a NULL ktr_vp when it is destroyed in
ktr_freerequest(). We shouldn't call vrele() on the vnode in that case.

Reported by: bde


# 101123 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the ktrace write operation so that it invokes the MAC
framework's vnode write authorization check.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 99009 28-Jun-2002 alfred

More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.


# 97993 07-Jun-2002 jhb

Overhaul the ktrace subsystem a bit. For the most part, the actual vnode
operations to dump a ktrace event out to an output file are now handled
asychronously by a ktrace worker thread. This enables most ktrace events
to not need Giant once p_tracep and p_traceflag are suitably protected by
the new ktrace_lock.

There is a single todo list of pending ktrace requests. The various
ktrace tracepoints allocate a ktrace request object and tack it onto the
end of the queue. The ktrace kernel thread grabs requests off the head of
the queue and processes them using the trace vnode and credentials of the
thread triggering the event.

Since we cannot assume that the user memory referenced when doing a
ktrgenio() will be valid and since we can't access it from the ktrace
worker thread without a bit of hassle anyways, ktrgenio() requests are
still handled synchronously. However, in order to ensure that the requests
from a given thread still maintain relative order to one another, when a
synchronous ktrace event (such as a genio event) is triggered, we still put
the request object on the todo list to synchronize with the worker thread.
The original thread blocks atomically with putting the item on the queue.
When the worker thread comes across an asynchronous request, it wakes up
the original thread and then blocks to ensure it doesn't manage to write a
later event before the original thread has a chance to write out the
synchronous event. When the original thread wakes up, it writes out the
synchronous using its own context and then finally wakes the worker thread
back up. Yuck. The sychronous events aren't pretty but they do work.

Since ktrace events can be triggered in fairly low-level areas (msleep()
and cv_wait() for example) the ktrace code is designed to use very few
locks when posting an event (currently just the ktrace_mtx lock and the
vnode interlock to bump the refcoun on the trace vnode). This also means
that we can't allocate a ktrace request object when an event is triggered.
Instead, ktrace request objects are allocated from a pre-allocated pool
and returned to the pool after a request is serviced.

The size of this pool defaults to 100 objects, which is about 13k on an
i386 kernel. The size of the pool can be adjusted at compile time via the
KTRACE_REQUEST_POOL kernel option, at boot time via the
kern.ktrace_request_pool loader tunable, or at runtime via the
kern.ktrace_request_pool sysctl.

If the pool of request objects is exhausted, then a warning message is
printed to the console. The message is rate-limited in that it is only
printed once until the size of the pool is adjusted via the sysctl.

I have tested all kernel traces but have not tested user traces submitted
by utrace(2), though they should work fine in theory.

Since a ktrace request has several properties (content of event, trace
vnode, details of originating process, credentials for I/O, etc.), I chose
to drop the first argument to the various ktrfoo() functions. Currently
the functions just assume the event is posted from curthread. If there is
a great desire to do so, I suppose I could instead put back the first
argument but this time make it a thread pointer instead of a vnode pointer.

Also, KTRPOINT() now takes a thread as its first argument instead of a
process. This is because the check for a recursive ktrace event is now
per-thread instead of process-wide.

Tested on: i386
Compiles on: sparc64, alpha


# 96886 18-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


# 94861 16-Apr-2002 jhb

Lock proctree_lock instead of pgrpsess_lock.


# 94618 13-Apr-2002 jhb

- Change the first argument of ktrcanset(), ktrsetchildren(), and ktrops()
to a thread pointer so that ktrcanset() can use td_ucred.
- Add some proc locking to partially protect p_tracep and p_traceflag.


# 93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 92723 19-Mar-2002 alfred

Remove __P.


# 92310 15-Mar-2002 alfred

Giant pushdown for read/write/pread/pwrite syscalls.

kern/kern_descrip.c:
Aquire Giant in fdrop_locked when file refcount hits zero, this removes
the requirement for the caller to own Giant for the most part.

kern/kern_ktrace.c:
Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls.

kern/vfs_bio.c:
Aquire Giant in bwillwrite if needed.

kern/sys_generic.c
Giant pushdown, remove Giant for:
read, pread, write and pwrite.
readv and writev aren't done yet because of the possible malloc calls
for iov to uio processing.

kern/sys_socket.c
Grab giant in the socket fo_read/write functions.

kern/vfs_vnops.c
Grab giant in the vnode fo_read/write functions.


# 91416 27-Feb-2002 jhb

Add a comment about an unlocked access to p_ucred that will go away in
the near future.


# 91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# 91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


# 85397 23-Oct-2001 dillon

Fix ktrace enablement/disablement races that can result in a vnode
ref count panic.

Bug noticed by: ps
Reviewed by: ps
MFC after: 1 day


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82585 30-Aug-2001 dillon

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


# 79335 05-Jul-2001 rwatson

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


# 77183 25-May-2001 rwatson

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 75893 23-Apr-2001 jhb

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# 74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# 70792 08-Jan-2001 alfred

Don't use SCARG.

Pointed out by: bde


# 70707 06-Jan-2001 alfred

Limit size of passed in data for utrace function.
Requested by: rwatson
Obtained from: NetBSD


# 70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


# 69947 12-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


# 65556 06-Sep-2000 jasone

Add KTR, a facility that logs kernel events in order to to facilitate
debugging.

Acquired from: BSDi (BSD/OS)
Submitted by: dfr, grog, jake, jhb


# 65237 30-Aug-2000 rwatson

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


# 62976 11-Jul-2000 mckusick

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


# 62791 07-Jul-2000 green

Change that &@!$# UIO_READ to be UIO_WRITE. I tested the ktrace stuff,
but somehow... pass the pointy hat, again!


# 62550 04-Jul-2000 mckusick

Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from: BSD/OS


# 62378 02-Jul-2000 green

Modify ktrace's general I/O tracing, ktrgenio(), to use a struct uio *
instead of a struct iovec * array and int len. Get rid of stupidly trying
to allocate all of the memory and copyin()ing the entire iovec[], and
instead just do the proper VOP_WRITE() in ktrwrite() using a copy of
the struct uio that the syscall originally used.

This solves the DoS which could easily be performed; to work around the
DoS, one could also remove "options KTRACE" from the kernel. This is
a very strong MFC candidate for 4.1.

Found by: art@OpenBSD.org


# 59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


# 54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


# 53212 16-Nov-1999 phk

This is a partial commit of the patch from PR 14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# 51941 04-Oct-1999 marcel

Fix style bug.

Submitted by: bde


# 51791 29-Sep-1999 marcel

sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.


# 51492 21-Sep-1999 green

Kill some spammage that seems to have gotten in through diffs from marcel's
local tree (which happens to have some things we don't :)


# 51484 20-Sep-1999 marcel

When bcopying the program name into the ktrace header, make sure we include
the terminating zero by copying MAXCOMLEN + 1 bytes. This fixes the garbage
that occasionally appeared behind the programname when it is at least MAXCOMLEN
bytes long (such as communicator-4.61-bin).


# 50668 30-Aug-1999 dima

ktrace should not follow symlinks either.

Suggested by: bde


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 47955 16-Jun-1999 dt

Make sure syscall arguments properly aligned in ktrace records.

Make syscall return value a register_t.

Based on a patch from Hidetoshi Shimokawa.
Mostly reviewed by: Hidetoshi Shimokawa and Bruce Evans.


# 46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 41628 09-Dec-1998 rvb

In ktrwrite, use uio_procp = curproc vs 0


# 41059 10-Nov-1998 peter

add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()


# 33678 20-Feb-1998 bde

Don't depend on "implicit int".


# 31561 05-Dec-1997 bde

Don't include <sys/lock.h> in headers when only `struct simplelock' is
required. Fixed everything that depended on the pollution.


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 18469 22-Sep-1996 phk

Remove the extra length field from the utrace entries. It's redundant.


# 18398 19-Sep-1996 phk

Add the utrace(caddr_t addr,size_t len) syscall, that will store the
data pointed at in a ktrace file, if this process is being ktrace'ed.
I'm using this to profile malloc usage.
The advantage is that there is no context around this call, ie, no
open file or socket, so it will work in any process, and you can
decide if you want it to collect data or not.


# 17429 04-Aug-1996 phk

Add separate kmalloc classes for BIO buffers and Ktrace info.


# 14529 11-Mar-1996 hsu

From Lite2: proc LIST changes.
Reviewed by: david & bde


# 13203 03-Jan-1996 wollman

Converted two options over to the new scheme: USER_LDT and KTRACE.


# 12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# 12577 02-Dec-1995 bde

Completed function declarations and/or added prototypes.


# 12221 12-Nov-1995 bde

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 3308 02-Oct-1994 phk

All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources