History log of /freebsd-9.3-release/sys/kern/kern_sig.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 262057 17-Feb-2014 avg

MFC r258622,258675: dtrace sdt: remove the ugly sname parameter of
SDT_PROBE_DEFINE


# 260227 03-Jan-2014 jilles

MFC r258281: Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.

Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.

This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).

PR: kern/184002


# 255763 21-Sep-2013 markj

MFC r252894:
Add SDT_PROBE_DEFINE0 for consistency with SDT_PROBE0.

MFC r253022:
Also define SDT_PROBE_DEFINE0 for the !KDTRACE_HOOKS case.

MFC r254266:
Add event handlers for module load and unload events. The load handlers are
called after the module has been loaded, and the unload handlers are called
before the module is unloaded. Moreover, the module unload handlers may
return an error to prevent the unload from proceeding.

MFC r254267:
Remove some unused fields from struct linker_file. They were added in
r172862 for use by the DTrace SDT framework but don't seem to have ever
been used.

MFC r254268:
FreeBSD's DTrace implementation has a few problems with respect to handling
probes declared in a kernel module when that module is unloaded. In
particular,

* Unloading a module with active SDT probes will cause a panic. [1]
* A module's (FBT/SDT) probes aren't destroyed when the module is unloaded;
trying to use them after the fact will generally cause a panic.

This change fixes both problems by porting the DTrace module load/unload
handlers from illumos and registering them with the corresponding
EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all
probes defined in a module when that module is unloaded, and to prevent a
module unload from proceeding if some of its probes are active. The latter
problem has already been fixed for FBT probes by checking lf->nenabled in
kern_kldunload(), but moving the check into the DTrace framework generalizes
it to all kernel providers and also fixes a race in the current
implementation (since a probe may be activated between the check and the
call to linker_file_unload()).

Additionally, the SDT implementation has been reworked to define SDT
providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT
to create and destroy SDT probes when a module is loaded or unloaded. This
simplifies things quite a bit since it means that pretty much all of the SDT
code can live in sdt.ko, and since it becomes easier to integrate SDT with
the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible
in that SDT providers spanning multiple modules can be created on the fly
when a module is loaded; at the moment it looks like illumos' SDT
implementation requires all SDT probes to be statically defined in a single
kernel table.

MFC r254309:
Use kld_{load,unload} instead of mod_{load,unload} for the linker file load
and unload event handlers added in r254266.

MFC r254350:
Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.


# 253373 15-Jul-2013 gavin

Merge r244451 from head (originally by pjd):
Use correct file permissions when looking for available core file if
kern.corefile contains %I.

Discussed with: pjd (some time ago)
Approved by: re (kib)


# 251147 30-May-2013 jhb

MFC 246417,247116,248584:
Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary. However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly. Instead of setting PBDRY on individual sleeps, change
the VFS_*() and VOP_*() methods to defer stop signals for filesystems which
request this behavior via a new VFCF_SBDRY flag. Note that this has to be
a VFC flag rather than a MNTK flag so that it works properly with
VFS_MOUNT() when the mount is not yet fully constructed. For now, only the
NFS clients set this new flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.
- Ignore thread suspend requests due to SIGSTOP if stop signals are
currently deferred. This can occur if a process is stopped via
SIGSTOP while a thread is running or runnable but before it has set
TDF_SBDRY.


# 248085 09-Mar-2013 marius

MFC: r227309 (partial)

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 247349 26-Feb-2013 jhb

MFC 240467:
Ignore stop and continue signals sent to an exiting process. Stop signals
set p_xstat to the signal that triggered the stop, but p_xstat is also
used to hold the exit status of an exiting process. Without this change,
a stop signal that arrived after a process was marked P_WEXIT but before
it was marked a zombie would overwrite the exit status with the stop signal
number.


# 247077 21-Feb-2013 kib

MFC r246484:
Allow ptrace(2) operation on the child created by vfork(2), if the
debugger is not the parent.


# 242283 29-Oct-2012 eadler

MFC r241855,r241859:
Update the kill(2) and killpg(2) man pages to the modern permission
checks. Also indicate killpg(2) is POSIX compliant.

Correct the killpg(2) return values:

Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Discussed with: jilles
Approved by: cperciva (implicit)


# 239754 27-Aug-2012 kib

MFC r239374:
Deliver SIGSYS to the guilty thread, not to the process.


# 234455 19-Apr-2012 kib

MFC r234172:
Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.

Use the flag to fix sigsuspend(2) error return ktrace records.


# 229485 04-Jan-2012 pluknet

MFC r226882:
Fix arguments list for proc:::signal-discard DTrace probe.

Reported by: Anton Yuzhaninov <citrin citrin ru>


# 226199 10-Oct-2011 kib

MFC r225894:
The sigwait(3) function shall not return EINTR, according to the
POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7
contains the wrapper sigwait(3) which hides EINTR from callers. The
EINTR return is used by libthr to handle required cancellation point
in the sigwait(3).

To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and
earlier, to have right ABI for sigwait(3), transform EINTR return from
sigwait(2) into ERESTART.

Approved by: re (bz)


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 224987 18-Aug-2011 jonathan

Add experimental support for process descriptors

A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


# 222320 26-May-2011 trasz

Fix support for RACCT_CORE by merging forgotten file.


# 220740 17-Apr-2011 jilles

ktrace: Log the code for all signals (PSIG events).

The code provides information on how the signal was generated.

Formerly, the code was only logged for traps, much like only signal handlers
for traps received a meaningful si_code before FreeBSD 7.0.

In rare cases, no information is available and 0 is still logged.

MFC after: 1 week


# 220390 06-Apr-2011 jhb

Fix several places to ignore processes that are not yet fully constructed.

MFC after: 1 week


# 219905 23-Mar-2011 jhb

Small style fix.


# 217819 25-Jan-2011 kib

Allow debugger to specify that children of the traced process should be
automatically traced. Extend the ptrace(PL_LWPINFO) to report that child
just forked.

Reviewed by: davidxu, jhb
MFC after: 2 weeks


# 213829 14-Oct-2010 davidxu

In kern_sigtimedwait(), move initialization code out of process lock,
instead of using SIGISMEMBER to test every interesting signal, just
unmask the signal set and let cursig() return one, get the signal
after it returns, call reschedule_signal() after signals are blocked
again.

In kern_sigprocmask(), don't call reschedule_signal() when it is
unnecessary.

In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with
sig_ffs(), rename variable 'i' to sig.


# 213761 13-Oct-2010 davidxu

sigqueue_collect_set() is no longer needed because other functions
maintain pending set correctly.


# 213642 09-Oct-2010 davidxu

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# 212425 10-Sep-2010 mdf

Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.


# 212075 31-Aug-2010 davidxu

rescure comments from RELENG_4.


# 212047 31-Aug-2010 davidxu

If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT
signals, because it is managed by debugger, however a normal signal sent to
a interruptibly sleeping thread wakes up the thread so it will handle the
signal when the process leaves the stopped state.

PR: 150138
MFC after: 1 week


# 211616 22-Aug-2010 rpaulo

Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by: The FreeBSD Foundation
Discussed with: rwaston [1]


# 210274 20-Jul-2010 davidxu

Fix function name in error messages.


# 209819 08-Jul-2010 jhb

- Various style and whitespace fixes.
- Make sugid_coredump and kern_logsigexit private to kern_sig.c.

Submitted by: bde (partially)
MFC after: 1 month


# 209688 04-Jul-2010 kib

Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused
debugee stop. The change should keep the ABI. Take care of compat32.

Discussed with: davidxu, jhb
MFC after: 2 weeks


# 209592 29-Jun-2010 jhb

Tweak the in-kernel API for sending signals to threads:
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
pksignal() except that they accept a thread as an argument instead of
a process. They send a signal to a specific thread rather than to an
individual process.

Reviewed by: kib


# 209389 21-Jun-2010 kib

Do not report a stack garbage as the old value for debug.ncores sysctl.

Reported by: brucec


# 208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


# 207418 30-Apr-2010 alfred

Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(),
use the heap instead.

Obtained from: Juniper Networks

Reviewed by: jhb


# 206264 06-Apr-2010 kib

When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with: pho
Reviewed by: alc
MFC after: 4 weeks


# 204552 02-Mar-2010 alfred

Merge projects/enhanced_coredumps (r204346) into HEAD:

Enhanced process coredump routines.

This brings in the following features:
1) Limit number of cores per process via the %I coredump formatter.
Example:
if corefilename is set to %N.%I.core AND num_cores = 3, then
if a process "rpd" cores, then the corefile will be named
"rpd.0.core", however if it cores again, then the kernel will
generate "rpd.1.core" until we hit the limit of "num_cores".

this is useful to get several corefiles, but also prevent filling
the machine with corefiles.

2) Encode machine hostname in core dump name via %H.

3) Compress coredumps, useful for embedded platforms with limited space.
A sysctl kern.compress_user_cores is made available if turned on.

To enable compressed coredumps, the following config options need to be set:
options COMPRESS_USER_CORES
device zlib # brings in the zlib requirements.
device gzio # brings in the kernel vnode gzip output module.

4) Eventhandlers are fired to indicate coredumps in progress.

5) The imgact sv_coredump routine has grown a flag to pass in more
state, currently this is used only for passing a flag down to compress
the coredump or not.

Note that the gzio facility can be used for generic output of gzip'd
streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan


# 202881 23-Jan-2010 kib

Staticise sigqueue manipulation functions used only in kern_sig.c.

MFC after: 1 week


# 202692 20-Jan-2010 kib

When traced process is about to receive the signal, the process is
stopped and debugger may modify or drop the signal. After the changes to
keep process-targeted signals on the process sigqueue, another thread
may note the old signal on the queue and act before the thread removes
changed or dropped signal from the process queue. Since process is
traced, it usually gets stopped. Or, if the same signal is delivered
while process was stopped, the thread may erronously remove it,
intending to remove the original signal.

Remove the signal from the queue before notifying the debugger. Restore
the siginfo to the head of sigqueue when signal is allowed to be
delivered to the debugee, using newly introduced KSI_HEAD ksiginfo_t
flag. This preserves required order of delivery. Always restore the
unchanged signal on the curthread sigqueue, not to the process queue,
since the thread is about to get it anyway, because sigmask cannot be
changed.

Handle failure of reinserting the siginfo into the queue by falling
back to sq_kill method, calling sigqueue_add with NULL ksi.

If debugger changed the signal to be delivered, use sigqueue_add()
with NULL ksi instead of only setting sq_signals bit.

Reported by: Gardner Bell <gbell72 rogers com>
Analyzed and first version of fix by: Tijl Coosemans <tijl coosemans org>
PR: 142757
Reviewed by: davidxu
MFC after: 2 weeks


# 200082 03-Dec-2009 kib

Remove wrong assertion. Debugee is allowed to lose a signal.

Reported and tested by: jh
MFC after: 2 weeks


# 199355 17-Nov-2009 kib

Among signal generation syscalls, only sigqueue(2) is allowed by POSIX
to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag
that allows sigqueue_add() to fail while trying to allocate memory for
new siginfo. When the flag is not set, behaviour is the same as for
KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is
kept to preserve KBI.

Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is
generated by kernel. Deliver siginfo when signal is generated by kill(2)
family of syscalls (SI_USER with properly filled si_uid and si_pid), or
by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag
is not set for the ksi, low memory condition cause old behaviour.

Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL
si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add
pksignal(9) that behaves like psignal but takes ksi, and ddb kill
command implemented as pksignal(..., ksi = NULL) to not do allocation
while in debugger.

While there, remove some register specifiers and use ANSI C prototypes.

Reviewed by: davidxu
MFC after: 1 month


# 199136 10-Nov-2009 kib

In r198506, kern_sigsuspend() started doing cursig/postsig loop to make
sure that a signal was delivered to the thread before returning from
syscall. Signal delivery puts new return frame on the user stack, and
modifies trap frame to enter signal handler. As a consequence, syscall
return code sets EINTR as error return for signal frame, instead of the
syscall return.

Also, for ia64, due to different registers layout for those two kind of
frames, usermode sigsegfaulted when returned from signal handler.

Use newly-introduced cpu_set_syscall_retval(9) to set syscall result,
and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return
code from modifying this frame [1].

Another issue is that pending SIGCONT might be cancelled by SIGSTOP,
causing postsig() not to deliver any catched signal [2]. Modify
postsig() to return 1 if signal was posted, and 0 otherwise, and use
this in the kern_sigsuspend loop.

Proposed by: marcel [1]
Noted by: davidxu [2]
Reviewed by: marcel, davidxu
MFC after: 1 month


# 198670 30-Oct-2009 kib

Trapsignal() and postsig() call kern_sigprocmask() with both process
lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have
ps_mtx locked to decide and wakeup a thread, causing recursion on the
mutex.

Inform kern_sigprocmask() and reschedule_signals() about lock state
of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion.

Reported and tested by: keramida
MFC after: 1 month


# 198590 29-Oct-2009 kib

Trapsignal() calls kern_sigprocmask() when delivering catched signal
with proc lock held.

Reported and tested by: Mykola Dzham freebsd at levsha org ua
MFC after: 1 month


# 198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


# 198506 27-Oct-2009 kib

In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


# 197983 12-Oct-2009 jkoshy

Improve the description of sysctl "kern.sugid_coredump".

Submitted by: Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net>
on -hackers


# 197976 12-Oct-2009 kib

Fix typo.

Submitted by: rdivacky
MFC after: 1 month


# 197963 11-Oct-2009 kib

Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.

Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.

Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.

Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.

Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month


# 197660 01-Oct-2009 kib

Fix typo.

MFC after: 3 days


# 197134 12-Sep-2009 rwatson

Use C99 initialization for struct filterops.

Obtained from: Mac OS X
Sponsored by: Apple Inc.
MFC after: 3 weeks


# 195702 14-Jul-2009 kib

Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)


# 195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


# 194697 23-Jun-2009 pho

vn_open_cred() needs a non NULL ucred pointer

Reviewed by: kib


# 194586 21-Jun-2009 kib

Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson


# 190888 10-Apr-2009 rwatson

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# 189074 26-Feb-2009 ed

Remove even more unneeded variable assignments.

kern_time.c:
- Unused variable `p'.

kern_thr.c:
- Variable `error' is always caught immediately, so no reason to
initialize it. There is no way that error != 0 at the end of
create_thread().

kern_sig.c:
- Unused variable `code'.

kern_synch.c:
- `rval' is always assigned in all different cases.

kern_rwlock.c:
- `v' is always overwritten with RW_UNLOCKED further on.

kern_malloc.c:
- `size' is always initialized with the proper value before being used.

kern_exit.c:
- `error' is always caught and returned immediately. abort2() never
returns a non-zero value.

kern_exec.c:
- `len' is always assigned inside the if-statement right below it.

tty_info.c:
- `td' is always overwritten by FOREACH_THREAD_IN_PROC().

Found by: LLVM's scan-build


# 184667 05-Nov-2008 davidxu

Revert rev 184216 and 184199, due to the way the thread_lock works,
it may cause a lockup.

Noticed by: peter, jhb


# 184199 23-Oct-2008 davidxu

Actually, for signal and thread suspension, extra process spin lock is
unnecessary, the normal process lock and thread lock are enough. The
spin lock is still needed for process and thread exiting to mimic
single sched_lock.


# 183911 15-Oct-2008 davidxu

Move per-thread userland debugging flags into seperated field,
this eliminates some problems of locking, e.g, a thread lock is needed
but can not be used at that time. Only the process lock is needed now
for new field.


# 182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 181334 05-Aug-2008 jhb

If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup(). When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup. However, this required
grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked. The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up. The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with: jeff
Glanced at by: sam
Tested by: Jurgen Weber jurgen - ish com au
MFC after: 2 weeks


# 179276 24-May-2008 jb

Add DTrace 'proc' provider probes using the Statically Defined Trace
(sdt) mechanism.


# 177471 21-Mar-2008 jeff

- Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs
to enter thread_suspend_check().
- Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
thread_suspend_check() to ast() rather than userret().
- Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
that we don't miss a suspend request. If this is set use the
expensive signal path.
- Set NEEDSUSPCHK when creating a new thread in thr in case the
creating thread is due to be suspended as well but has not yet.

Reviewed by: davidxu (Authored original patch)


# 177368 19-Mar-2008 jeff

- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.


# 177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 176936 08-Mar-2008 rwatson

Use sbuf routines to construct core dump filenames rather than custom
string buffer handling, making the code both easier to read and more
robust against string-handling bugs.

MFC after: 1 week


# 176935 08-Mar-2008 rwatson

Unlock the process lock when expand_name() fails, or we may leak the
process lock leading to a hang. This bug was introduced in
kern_sig.c:1.351, when the call to expand_name() was moved earlier
bit this particular error case was not updated.


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 174756 18-Dec-2007 obrien

Be more exact with sigaction SA_SIGINFO handling.

Reviewed by: marcel


# 173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 172995 25-Oct-2007 csjp

Implement AUE_CORE, which adds process core dump support into the kernel.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event. When a process
dumps a core, it could be security relevant. It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.

The record that is generated looks like this:

header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111

- We allocate a completely new record to make sure we arent clobbering
the audit data associated with the syscall that produced the core
(assuming the core is being generated in response to SIGABRT and not
an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
beginning of the coredump call. Make sure we free the storage referenced
by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts

Obtained from: TrustedBSD Project
Reviewed by: rwatson
MFC after: 1 month


# 172916 23-Oct-2007 csjp

Move where we audit the PID argument such that we unconditionally
audit it at the beginning of the syscall. This fixes a problem
where the user supplies an invalid process ID which is > 0 which
results in the PID argument not being audited.

Obtained from: TrustedBSD Project
MFC after: 1 week


# 171494 19-Jul-2007 jeff

- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and
actually causes LORs and other panics.

Reported by: mlaier
Approved by: re


# 170586 11-Jun-2007 jeff

- Add a missing PROC_SUNLOCK() in tdsignal()


# 170481 09-Jun-2007 mjacob

Initialized ets to zero. This is arguably a gcc bug in that ets is always
set to rts when timeout is non-NULL and then timevalid is set and ets is
only checked later when timervalid is set.


# 170296 04-Jun-2007 jeff

Commit 4/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Move some common code into thread_suspend_switch() to handle the
mechanics of suspending a thread. The locking here is incredibly
convoluted and should be simplified.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 170174 31-May-2007 jeff

- Move rusage from being per-process in struct pstats to per-thread in
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.

Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)


# 170152 31-May-2007 kib

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# 169907 23-May-2007 rwatson

Comment that tdsignal() may be entered from the debugger.


# 167787 21-Mar-2007 jhb

Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 167211 04-Mar-2007 rwatson

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 166601 09-Feb-2007 delphij

Give which signal caller has attempted to deliver when panicking.


# 166073 17-Jan-2007 delphij

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# 165537 25-Dec-2006 davidxu

break loop early if we know that there are at least two signals.


# 164184 11-Nov-2006 trhodes

Merge posix4/* into normal kernel hierarchy.

Reviewed by: glanced at by jhb
Approved by: silence on -arch@ and -standards@


# 163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 163601 21-Oct-2006 davidxu

Use macro TAILQ_FOREACH_SAFE instead of expanding it.


# 163541 20-Oct-2006 jhb

Remove the check that prevented signals from being delivered to exiting
processes. It was originally added back when support for Linux threads
(and thus shared sigacts objects) was added, but no one knows why. My
guess is that at some point during the Linux threads patches, the sigacts
object was torn down during exit1(), so this check was added to prevent
a panic for that race. However, the stuff that was actually committed to
the tree doesn't teardown sigacts until wait() making the above race moot.
Re-allowing signals here lets one interrupt a NFS request during process
teardown (such as closing descriptors) on an interruptible mount.

Requested by: kib (long time ago)
MFC after: 1 week


# 163018 04-Oct-2006 davidxu

Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.


# 158471 12-May-2006 jhb

Remove various bits of conditional Alpha code and fixup a few comments.


# 158351 07-May-2006 tegge

Call vn_finished_write() before calling the coredump handler which will
indirectly call vn_start_write() as necessary for each write.


# 157948 21-Apr-2006 ps

Don't try to kill embryonic processes in killpg1(). This prevents
a race condition between fork() and kill(pid,sig) with pid < 0 that
can cause a kernel panic.

Submitted by: up
MFC after: 3 weeks


# 157233 28-Mar-2006 jhb

- Conditionalize Giant around VFS operations for ALQ, ktrace, and
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
in coredump().

Reported by: jmg (2)


# 156472 09-Mar-2006 davidxu

Remove _STOPEVENT call, it is already called in issignal, simplify
code for SIGKILL signal.


# 156213 02-Mar-2006 davidxu

Add signal set sq_kill to sigqueue structure, the member saves all
signals sent by kill() syscall, without this, a signal sent by
sigqueue() can cause a signal sent by kill() to be lost.


# 155947 23-Feb-2006 davidxu

1. Refine kern_sigtimedwait() to remove redundant code.
2. Fix a bug, if thread got a SIGKILL signal, call sigexit() to kill
its process.

MFC after: 3 days


# 155940 23-Feb-2006 davidxu

Code cleanup, simply compare with curproc.


# 155741 15-Feb-2006 davidxu

Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by: jhb
MFC after: 1 week


# 155633 13-Feb-2006 wsalamon

Audit the arguments to the kill(2) and killpg(2) system calls.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


# 155594 13-Feb-2006 davidxu

In order to speed up process suspension on MP machine, send IPI to
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.


# 155298 04-Feb-2006 davidxu

Create childproc_jobstate function to report job control state, this
also fixes a bug in childproc_continued which ignored PS_NOCLDSTOP.


# 153697 24-Dec-2005 davidxu

Avoid kernel panic when attaching a process which may not be stopped
by debugger, e.g process is dumping core. Only access p_xthread if
P_STOPPED_TRACE is set, this means thread is ready to exchange signal
with debugger, print a warning if P_STOPPED_TRACE is not set due to
some bugs in other code, if there is.

The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and
is slightly adjusted.


# 153264 09-Dec-2005 davidxu

Add a sysctl to force a process to sigexit if a trap signal is
being hold by current thread or ignored by current process,
otherwise, it is very possible the thread will enter an infinite loop
and lead to an administrator's nightmare.


# 153252 09-Dec-2005 davidxu

Cleanup sigqueue sysctl.


# 153153 06-Dec-2005 davidxu

Fix a lock leak in childproc_continued().


# 152975 30-Nov-2005 davidxu

set signal queue values for sysconf().


# 152327 12-Nov-2005 davidxu

Make sure only remove one signal by debugger.


# 152223 09-Nov-2005 davidxu

WIFxxx macros requires an int type but p_xstat is short, convert it
to int before using the macros.

Bug reported by : Pyun YongHyeon pyunyh at gmail dot com


# 152185 08-Nov-2005 davidxu

Add support for queueing SIGCHLD same as other UNIX systems did.

For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)


# 152029 04-Nov-2005 davidxu

Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.


# 151993 03-Nov-2005 davidxu

Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.


# 151869 30-Oct-2005 davidxu

Let itimer store itimerspec instead of itimerval, so I don't have to
convert to or from timeval frequently.

Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.

Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.

Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.


# 151575 23-Oct-2005 davidxu

1. Make ksiginfo_alloc and ksiginfo_free public.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
to be managed by outside code, the KSI_INS indicates sigqueue_add should
directly insert passed ksiginfo into queue other than copy it.


# 151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


# 147046 06-Jun-2005 davidxu

Fix a bug relavant to debugging, a masked signal unexpectedly interrupts
a sleeping thread when process is being debugged.

PR: GNU/77818
Tested by: Sean C. Farley <sean-freebsd at farley org>


# 145261 19-Apr-2005 davidxu

Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after : 4 days


# 144851 10-Apr-2005 das

Suspend all other threads in the process while generating a core dump.
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.

Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.


# 143144 04-Mar-2005 davidxu

The td_waitset is pointing to a stack address when thread is waiting
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.


# 143033 02-Mar-2005 davidxu

In kern_sigtimedwait, remove waitset bits for td_sigmask before
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.


# 142072 19-Feb-2005 davidxu

Don't restart a timeout wait in kern_sigtimedwait, also allow it
to wait longer than a single integer can represent.


# 141815 13-Feb-2005 sobomax

Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by: rwatson


# 141812 13-Feb-2005 sobomax

Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by: rwatson
MFC after: 2 weeks


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 138811 13-Dec-2004 jeff

- If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
This is especially important on ULE, where adjusting the priority
does not guarantee that a thread will be granted a greater time slice.


# 137746 15-Nov-2004 imp

Fix an off by one error. MAXPATHLEN already has +1.


# 137058 30-Oct-2004 alfred

Allow kill -9 to kill processes stuck in procfs STOPEVENTs.


# 137030 29-Oct-2004 alfred

Backout 1.291.

re doesn't seem to think this fixes:
Desired features for 5.3-RELEASE "More truss problems"


# 136142 05-Oct-2004 davidxu

Use scheduler api to adjust thread priority.


# 136086 03-Oct-2004 davidxu

Don't bother to turn off other P_STOPPED bits for SIGKILL, doing
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.


# 136024 01-Oct-2004 alfred

Clear a process's procfs trace points upon delivery of SIGKILL.

MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")


# 134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


# 133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 133354 09-Aug-2004 jmg

add option to automaticly mark core dumps with the nodump flag

PR: 57065
Submitted by: Walter C. Pelissero


# 133078 03-Aug-2004 pjd

Don't skip permission checks when sending signals to zombie processes.

Pointed out by: bde
Reviewed by: rwatson


# 132856 29-Jul-2004 pjd

Syscall kill(2) called for a zombie process should return 0.

Obtained from: Darwin


# 132264 16-Jul-2004 jhb

Improve readability a bit by changing some code at the end of a function
that did:

if (foo)
return
else
blah

to just do the simpler

if (!foo)
blah

instead.


# 132087 13-Jul-2004 davidxu

Add code to support debugging threaded process.

1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
thread current user thread is running on. Add tm_dflags into
kse_thr_mailbox, the flags is written by debugger, it tells
UTS and kernel what should be done when the process is being
debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

TMDF_SSTEP is used to tell kernel to turn on single stepping,
or turn off if it is not set.

TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
whenever possible, to UTS, it means do not run the user thread
until debugger clears it, this behaviour is necessary because
gdb wants to resume only one thread when the thread's pc is
at a breakpoint, and thread needs to go forward, in order to
avoid other threads sneak pass the breakpoints, it needs to remove
breakpoint, only wants one thread to go. Also, add km_lwp to
kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
switch time when process is not being debugged, so when process
is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
p_xthread is used by a thread when it wants to report event
to debugger, every thread can set the pointer, especially, when
it is used in ptracestop, it is the last thread reporting event
will win the race. Every thread has a td_xsig to exchange signal
with debugger, thread uses TDF_XSIG flag to indicate it is reporting
signal to debugger, if the flag is not cleared, thread will keep
retrying until it is cleared by debugger, p_xthread may be
used by debugger to indicate CURRENT thread. The p_xstat is still
in proc structure to keep wait() to work, in future, we may
just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
a thread. When process stops, debugger can set the flag for
thread, thread will check the flag in thread_suspend_check,
enters a loop, unless it is cleared by debugger, process is
detached or process is existing. The flag is also checked in
ptracestop, so debugger can temporarily suspend a thread even
if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen


# 132016 12-Jul-2004 marcel

Implement the PT_LWPINFO request. This request can be used by the
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.

The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1. The data argument is allowed to be smaller than the size of the
ptrace_lwpinfo structure known to the kernel, but not 0. This
is opposite to what NetBSD allows. The reason for this is that
we can extend the structure without affecting older binaries.
2. On NetBSD the tracing process is to set the pl_lwpid field to
the Id of the LWP it wants information of. We don't do that.
Our ptrace interface allows passing the LWP Id instead of the
PID. The tracing process is to set the PID to the LWP Id it
wants information of.
3. When the PID is actually the PID of the tracing process, this
request returns the information about the LWP that caused the
process to stop. This was the whole purpose of the request in
the first place.

When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.


# 131473 02-Jul-2004 jhb

- Change mi_switch() and sched_switch() to accept an optional thread to
switch to. If a non-NULL thread pointer is passed in, then the CPU will
switch to that thread directly rather than calling choosethread() to pick
a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
requiring all callers of mi_switch() to know to do this if they can be
called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
the middle of the function prototypes and up above into their own
section.


# 130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


# 130192 07-Jun-2004 davidxu

According to SUSv3, sigwait is different with sigwaitinfo, sigwait
returns error code in return value, not in errno.


# 129989 02-Jun-2004 tjr

Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by: davidxu


# 129543 21-May-2004 bde

Fixed some style bugs in tdsigwakeup().


# 129513 20-May-2004 jhb

In tdsigwakeup(), use TD_ON_SLEEPQ() rather than TD_IS_SLEEPING() to see if
a thread is on a sleep queue and should have it's sleep aborted.

Reported by: Thierry Herbelot thierry at herbelot dot com


# 128159 12-Apr-2004 cperciva

stop() no longer needs sched_lock held; in fact, holding sched_lock causes
a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.

Reported by: Scott Sipe <cscotts@mindspring.com>
Reviewed by: rwatson, jhb


# 127911 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 127594 29-Mar-2004 peter

Shorten some XXXKSE commentry


# 126672 05-Mar-2004 jhb

- Push down Giant in exit() and wait().
- Push Giant down a bit in coredump() and call coredump() with the proc
lock already held rather than unlocking it only to turn around and
relock it.

Requested by: peter


# 126567 03-Mar-2004 des

Use different dummy wait channels to avoid panic in msleep().

Reviewed by: jhb


# 126326 27-Feb-2004 jhb

Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
and condition variables. Having sleep qeueus in a hash table avoids
having to allocate a queue head for each wait channel. Thus, struct cv
has shrunk down to just a single char * pointer now. However, the
hash table does not hold threads directly, but queue heads. This means
that once you have located a queue in the hash bucket, you no longer have
to walk the rest of the hash chain looking for threads. Instead, you have
a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
differentiates between cv's and sleep/wakeup. For example, calls to
abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and
cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
sleep's no longer inherently bump the priority. Instead, this is soley
a propery of msleep() which explicitly calls sched_prio() before
blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used. The
associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
dropped and replaced with a single explicit clearing of td_wchan.
TD_SET_ONSLEEPQ() would really have only made sense if it had taken
the wait channel and message as arguments anyway. Now that that only
happens in one place, a macro would be overkill.


# 125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# 125186 28-Jan-2004 rwatson

Assert process lock in ptracestop(), since we're going to rely
on it, and later unlock it.


# 125131 28-Jan-2004 kan

Move the part of the comment which applies to osigsuspend where
it belongs. The current sigsuspend syscall does expect a pointer
to the mask as argument.

Submitted by: Igor Sysoev <is at rambler-co dot ru>


# 124944 25-Jan-2004 jeff

- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL. Assert that one of these is set in mi_switch() and propery
adjust the rusage statistics. This is to simplify the large number of
users of this interface which were previously all required to adjust the
proper counter prior to calling mi_switch(). This also facilitates more
switch and locking optimizations.
- Change all callers of mi_switch() to pass the appropriate paramter and
remove direct references to the process statistics.


# 124359 11-Jan-2004 rwatson

When not creating a core dump due to resource limits specifying
a maximum dump size of 0, return a size-related error, rather
than returning success. Otherwise, waitpid() will incorrectly
return a status indicating that a core dump was created. Note
that the specific error doesn't actually matter, since it's lost.

MFC after: 2 weeks
PR: 60367
Submitted by: Valentin Nechayev <netch@netch.kiev.ua>


# 124267 08-Jan-2004 rwatson

Drop the sigacts mutex around calls to stopevent() to avoid sleeping
holding the mutex. Because the sigacts pointer can't change while
the process is "live" (proc locking (x)), we know our pointer is still
valid.

In communication with: truckman
Reviewed by: jhb


# 124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


# 123273 07-Dec-2003 davidxu

Lock and unlock sched_lock when walking through thread list, current we
insert kse upcall thread into thread list at mi_switch time, process lock
is not enough.


# 121721 30-Oct-2003 davidxu

Try to fetch thread mailbox address in page fault trap, so when thread
blocks in page fault hanlder, and upcall thread can be scheduled. It is
useful if process is doing lots of mmap based I/O.


# 121510 25-Oct-2003 rwatson

Check (locked) before performing an advisory unlock following a failure
of vn_start_write(). Otherwise, we may inconsistently attempt to release
the advisory lock.

Pointed out by: teggej


# 121509 25-Oct-2003 rwatson

When generate a core dump, use advisory locking in an advisory way:
if we do acquire an advisory lock, great! We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway. This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).

Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>


# 121070 13-Oct-2003 davidxu

Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX
asks to inherit signal mask for execv.


# 120475 26-Sep-2003 robert

Move some tracing related code into its own function as it will
be needed for system call related ptrace functionality I plan
to commit soon.


# 118750 10-Aug-2003 nectar

panic() if we try to handle an out-of-range signal number in
psignal()/tdsignal(). The test was historically in psignal(). It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.

Reviewed by: iedowse, jhb


# 118231 30-Jul-2003 davidxu

Use correct signal when calling sigexit.


# 118094 27-Jul-2003 phk

Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.


# 117972 24-Jul-2003 mtm

The POSIX spec also requires that kern_sigtimedwait return
EINVAL if tv_nsec of the timeout is less than zero.


# 117814 20-Jul-2003 davidxu

Always deliver synchronous signal to UTS for SA threads.


# 117705 17-Jul-2003 davidxu

Fix sigwait to conform to POSIX.
When a signal is being delivered to process, first find a sigwait
thread to deliver, POSIX's argument is speed of delivering signal
to sigwait thread is faster than other ways. A signal in its wait
set will cause sigwait to return the signal number, a signal not
in its wait set but in not blocked by the thread also causes sigwait
to return, but sigwait returns EINTR, sigwait is oneshot operation,
only one signal can be delivered to its wait set, when a signal is
delivered to the sigwait thread, the thread's sigwait state is canceled.


# 117607 15-Jul-2003 davidxu

Rename thread_siginfo to cpu_thread_siginfo


# 117444 11-Jul-2003 davidxu

If a thread is sending signal to its process, if the thread can handle
the signal itself, it should get it without looking for other threads.


# 117255 05-Jul-2003 mtm

Make the conditional, which decides what siglist to put a signal on,
more concise and improve the comment.

Submitted by: bde


# 117205 03-Jul-2003 mtm

Signals sent specifically to a particular thread must
be delivered to that thread, regardless of whether it
has it masked or not.

Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.

This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.

The new behaviour still needs more work, though. If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.

Reviewed by: jdp, jeff, bde (style)


# 116963 28-Jun-2003 davidxu

o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
UTS to use kse_thr_interrupt interrupt a thread, and install a signal
frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
it is used to deliver synchronous signal to userland, and upcall
is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)


# 116961 28-Jun-2003 davidxu

Fix POSIX compatible bug for sigwaitinfo and sigtimedwait.
POSIX says siginfo pointer parameter can be NULL and if the
function success, it should return signal number but not zero.
The waitset it past should be negatived before it can be
used as thread signal mask.


# 116595 20-Jun-2003 davidxu

When a STOP signal is being sent to a process, it is possible all
threads in the process have already masked the signal, so job control
is delayed. But later a thread unmasking the STOP signal should enable
job control, so in issignal(), scanning all threads in process to see
if we can direct suspend some of them, not just suspend current thread.


# 116594 19-Jun-2003 davidxu

Fix typo. td should be td0.


# 116401 15-Jun-2003 davidxu

1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
non-threaded process. This is intended to be used by libpthread to
implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.


# 116361 14-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 116101 09-Jun-2003 jhb

- Add a td_pflags field to struct thread for private flags accessed only by
curthread. Unlike td_flags, this field does not need any locking.
- Replace the td_inktr and td_inktrace variables with equivalent private
thread flags.
- Move TDF_OLDMASK over to the private flags field so it no longer requires
sched_lock.


# 115046 15-May-2003 obrien

Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page. As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it. The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped. Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this
leads to an unkillable process.

PR: 44011
Submitted by: Mark Kettenis <kettenis@chello.nl>
Approved by: re(jhb)


# 114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


# 114861 09-May-2003 jhb

Remove Giant from kern_sigsuspend() and osigsuspend() as these should now
be MP safe.

Approved by: re (scottl)


# 114757 05-May-2003 jhb

Mostly sort the includes.


# 114755 05-May-2003 jhb

Lock the proc lock around calls to tdsignal() in the sigwait() family of
syscalls.


# 114754 05-May-2003 jhb

Make issignal() private to kern_sig.c since it is only called from cursig()
and cursig() is now a function rather than a macro.


# 114324 30-Apr-2003 jhb

Forgot to remove Giant around call to kern_sigaction() in
freebsd4_sigaction() in revision 1.232.


# 114027 25-Apr-2003 jhb

Push Giant down into kern_sigaction() instead of locking it around calls
to kern_sigaction() in the various callers of the function.


# 113928 23-Apr-2003 jhb

Remove Giant from osigblock(), osigsetmask(), and kern_sigaltstack().


# 113922 23-Apr-2003 jhb

- Reorganize osigstack() to do the copyin first, grab the proc lock once,
do all the various sigstack dances, unlock the proc lock, and finally do
the copyout. This more closely resembles the behavior of
kern_sigaltstack() and closes a small race.
- Remove Giant from osigstack as it is no longer needed.


# 113706 19-Apr-2003 davidxu

Unbreak sigaltstack syscall. sigonstack is now a function and
want proc lock be held.


# 113690 18-Apr-2003 jhb

- Make sigonstack() a regular function instead of an inline and add a proc
lock assertion to it.
- SIGPENDING() no longer needs sched_lock, so only grab sched_lock to set
the TDF_NEEDSIGCHK and TDF_ASTPENDING flags in signotify().
- Add a proc lock assertion to tdsigwakeup().
- Since we always set TDF_OLDMASK while holding the proc lock, the proc
lock is sufficient protection to check its state in postsig() and we only
need sched_lock when clearing the actual flag.


# 113685 18-Apr-2003 jhb

Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol
so that it can be used by binary emulators.


# 113615 17-Apr-2003 jhb

Don't hold the proc lock while performing sigset conversions on local
variables.


# 113614 17-Apr-2003 jhb

- Remove garbage SIGSETOR() that snuck into struct sigpending_args
definition.
- Use the proper constant for the last arg to kern_sigaction() in osigvec()
instead of a magic value.


# 113378 12-Apr-2003 davidxu

Style fix.


# 113375 11-Apr-2003 davidxu

Check SIG_HOLD action ealier to avoid missing test it in later code.


# 112932 01-Apr-2003 jeff

- p will be unused in cursig() if INVARIANTS is not defined. Access it
through td->td_proc to avoid the unused variable.

Spotted by: Maxim Konovalov <maxim@macomnet.ru>


# 112893 31-Mar-2003 jeff

- Define sigwait, sigtimedwait, and sigwaitinfo in terms of
kern_sigtimedwait() which is capable of supporting all of their semantics.
- These should be POSIX compliant but more careful review is needed before
we announce this.


# 112890 31-Mar-2003 jeff

- The siglist in the proc holds signals that were blocked by all threads
when they were delivered. In signotify() check to see if we have
unblocked any of those signals and post them to the thread.
- Use td_sigmask instead of p_sigmask in all cases.
- In sigpending return both signals pending on the thread and proc.
- Define a function, sigtd(), that finds the appropriate thread to deliver
the signal to if psignal() has been called instead of tdsignal().
- Define a function, tdsignal(), that delivers a signal to a specific thread
or if that thread has the signal blocked it may deliver it to the process
where it will wait for a thread to unblock it.
- Since we are delivering signals to a specific thread we do not need to
abort the sleep of all threads.
- Rename the old tdsignal() to tdsigwakeup().
- Save and restore the old signal mask to and from the thread.


# 112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


# 112884 31-Mar-2003 jeff

- Mark signals which may be delivered to any thread in the process with
SA_PROC. Signals without this flag should be directed to a particular
thread if this is possible.


# 112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


# 112079 11-Mar-2003 davidxu

This is a force-commit for:
kern_sig.c 1.215
kern_thread.c 1.103
kern_exit.c 1.199
proc.h 1.302

Orignal code would suspend an already suspended thread,
if user presses ^Z while a threaded program is running. Also
there is a race between job control and thread_exit(), the
new code tests job control requesting before thread exits,
in wait() syscall, be sure to check child process is fully
stopped, this avoids a later SIGCHILD and returns STOPPED
status twice for a threaded child proc. A thread_stopped()
function is added for common code in several places.


# 112071 10-Mar-2003 davidxu

Fix threaded process job control bug. SMP tested.

Reviewed by: julian


# 112014 08-Mar-2003 tjr

Hold the proc lock while accessing p_procsig in trapsignal().


# 111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


# 111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


# 111545 26-Feb-2003 davidxu

Fix a bug when handling SIGCONT.

Reported By: Mike Makonnen <mtm@identd.net>


# 111033 17-Feb-2003 jeff

- Add a new function, thread_signal_add(), that is called from postsig to
add a signal to a mailbox's pending set.
- Add a new function, thread_signal_upcall(), this causes the current thread
to upcall so that we can deliver pending signals.

Reviewed by: mini


# 111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


# 111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


# 110927 15-Feb-2003 tjr

Acquire Giant around calls to kern_sigaction() in sigaction(),
freebsd4_sigaction() and osigaction() instead of around the whole
body of those functions. They now no longer hold Giant around calls
to copyin() and copyout(), and it is slightly more obvious what
Giant is protecting.


# 110925 15-Feb-2003 tjr

osigpending() no longer needs Giant, for the same reason sigpending()
does not.


# 110923 15-Feb-2003 tjr

All uses of p_siglist are protected by the proc lock now, so there's
no need to acquire Giant in sigpending() anymore.


# 110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 109958 27-Jan-2003 peter

No longer force COMPAT_FREEBSD4 to be on.


# 109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 108863 07-Jan-2003 davidxu

Forgot to call setrunnable() for un-idled thread.


# 108862 07-Jan-2003 davidxu

Check signals for idled threads.


# 108338 27-Dec-2002 julian

Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.


# 107981 17-Dec-2002 phk

Don't cast a pointer to (intptr_t) and then on to (int) when we cannot
be sure that (int) is large enough. Instead cast only to (intptr_t) and
cast the switch/case values to (intptr_t) as well.


# 105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


# 104363 02-Oct-2002 phk

Fix mis-indentation.

Spotted by: FlexeLint


# 104306 01-Oct-2002 jmallett

Back our kernel support for reliable signal queues.

Requested by: rwatson, phk, and many others


# 104246 30-Sep-2002 jmallett

Back out code changes that snuck into the previous forced commit.


# 104245 30-Sep-2002 jmallett

(Forced commit, to clarify previous commit of ksiginfo/signal queue code.)

I've added a structure, kernel-private, to represent a pending or in-delivery
signal, called `ksiginfo'. It is roughly analogous to the basic information
that is exported by the POSIX interface 'siginfo_t', but more basic. I've
added functions to allocate these structures, and further to wrap all signal
operations using them.

Once the operations are wrapped, I've added a TailQ (see queue(3)) of these
structures to 'struct proc', and all pending signals are in that TailQ. When
a signal is being delivered, it is dequeued from the list. Once I finish
the spreading of ksiginfo throughout the tree, the dequeued structure will be
delivered to the process in question, whereas currently and normally, the
signal number is what is used.


# 104233 30-Sep-2002 jmallett

First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]


# 104129 29-Sep-2002 obrien

Fix style nit where conditionally compiled code was unconditionalized,
but style(9) was consulted.

Submitted by: bde


# 104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 103410 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Use ucontext_t's to store KSE thread state.
- Synthesize state for the UTS upon each upcall, rather than
saving and copying a trapframe.
- Deliver signals to KSE-aware processes via upcall.
- Rename kse mailbox structure fields to be more BSD-like.
- Store the UTS's stack in struct proc in a stack_t.

Reviewed by: bde, deischen, julian
Approved by: -arch


# 103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


# 103216 11-Sep-2002 julian

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


# 102950 05-Sep-2002 davidxu

s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)


# 102898 03-Sep-2002 davidxu

In the kernel code, we have the tsleep() call with the PCATCH argument.
PCATCH means 'if we get a signal, interrupt me!" and tsleep returns
either EINTR or ERESTART depending on the circumstances. ERESTART is
"special" because it causes the system call to fail, but right as it
returns back to userland it tells the trap handler to move %eip back a
bit so that userland will immediately re-run the syscall.
This is a syscall restart. It only works for things like read() etc where
nothing has changed yet. Note that *userland* is tricked into restarting
the syscall by the kernel. The kernel doesn't actually do the restart. It
is deadly for things like select, poll, nanosleep etc where it might cause
the elapsed time to be reset and start again from scratch. So those
syscalls do this to prevent userland rerunning the syscall:
if (error == ERESTART) error = EINTR;

Fake "signals" like SIGTSTP from ^Z etc do not normally invoke userland
signal handlers. But, in -current, the PCATCH *is* being triggered and
tsleep is returning ERESTART, and the syscall is aborted even though no
userland signal handler was run.
That is the fault here. We're triggering the PCATCH in cases that we
shouldn't. ie: it is being triggered on *any* signal processing, rather
than the case where the signal is posted to userland.
--- Peter

The work of psignal() is a patchwork of special case required by the process
debugging and job-control facilities...
--- Kirk McKusick
"The design and impelementation of the 4.4BSD Operating system"
Page 105

in STABLE source, when psignal is posting a STOP signal to sleeping
process and the signal action of the process is SIG_DFL, system will
directly change the process state from SSLEEP to SSTOP, and when
SIGCONT is posted to the stopped process, if it finds that the process
is still on sleep queue, the process state will be restored to SSLEEP,
and won't wakeup the process.

this commit mimics the behaviour in STABLE source tree.

Reviewed by: Jon Mini, Tim Robbins, Peter Wemm
Approved by: julian@freebsd.org (mentor)


# 102779 01-Sep-2002 iedowse

Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on: -arch
Review/suggestions: bde, jhb, peter, marcel


# 102433 26-Aug-2002 julian

move the assert to cover more cases


# 102309 23-Aug-2002 julian

Don't re-lock the sched lock if we didn't unlock it.

Original error by: David Xu <bsddiy@yahoo.com>
Fix by: David Xu <bsddiy@yahoo.com>
Completely failed to spot it: Julian Elischer <julian@freebsd.org>


# 102238 21-Aug-2002 julian

Revert some suspension/sleep/signal code from KSE-III
We need to rethink a bit of this and it doesn't matter if
we break the KSE test program for now as long
as non-KSE programs act as expected.

Submitted by: David Xu <bsddiy@yahoo.com>
(this guy's just asking to get hit with a commit bit..)


# 101500 08-Aug-2002 julian

Do some work on keeping better track of stopped/continued state.
I'm not sure what happenned to the original setting of the P_CONTINUED
flag. it appears to have been lost in the paper shuffling...

Submitted by: David Xu <bsddiy@yahoo.com>


# 101427 06-Aug-2002 bde

Try harder to "set signal flags proprly [sic] for ast()". See rev.1.154.


# 101176 01-Aug-2002 julian

Slight cleanup of some comments/whitespace.
Make idle process state more consistant.
Add an assert on thread state.
Clean up idleproc/mi_switch() interaction.
Use a local instead of referencing curthread 7 times in a row
(I've been told curthread can be expensive on some architectures)
Remove some commented out code.
Add a little commented out code (completion coming soon)

Reviewed by: jhb@freebsd.org


# 100976 30-Jul-2002 julian

Don't need to hold schedlock specifically for stop() ans it calls wakeup()
that locks it anyhow.

Reviewed by: jhb@freebsd.org


# 100593 24-Jul-2002 julian

revert some of the handling of STOP signals in
issignal(). Let thread_suspend_check() actually do the suspension
at the user boundary.

Submitted by: David Xu <bsddiy@yahoo.com>


# 99712 10-Jul-2002 truckman

Rearrange the code so that it checks whether the file is something
valid to write a core dump to before doing the preparations to actually
write to the file.

Call VOP_GETATTR() before dropping the initial vnode lock.


# 99337 03-Jul-2002 julian

Try clean up some of the mess that resulted from layers and layers
of p4 merges from -current as things started getting different.

Corroborated by: Similar patches just mailed by BDE.


# 99329 03-Jul-2002 julian

White space commit.
I'm working on this file but I wanted to make the whitespece commit
separatly.


# 99325 03-Jul-2002 gallatin

Hold the sched lock across call to forward_signal() in tdsignal() to
keep SMP systems from panic'ing when ^C'ing an app

suggested by julian


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 99012 29-Jun-2002 alfred

more caddr_t removal.


# 97999 07-Jun-2002 jhb

- trapsignal() no longer needs to acquire Giant for ktrpsig().
- Catch up to new ktrace API.


# 97950 06-Jun-2002 davidc

s/!SIGNOTEMPY/SIGISEMPTY/

Reviewed by: marcel, jhb, alfred


# 97714 01-Jun-2002 mike

Add POSIX.1-2001 WCONTINUED option for waitpid(2). A proc flag
(P_CONTINUED) is set when a stopped process receives a SIGCONT and
cleared after it has notified a parent process that has requested
notification via waitpid(2) with WCONTINUED specified in its options
operand. The status value can be checked with the new WIFCONTINUED()
macro.

Reviewed by: jake


# 97526 29-May-2002 julian

CURSIG() is not a macro so rename it cursig().

Obtained from: KSE tree


# 96886 18-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


# 96620 14-May-2002 rwatson

p_cansignal() returns an errno value; at some point, the check for
inter-process signalling ceased to preserve and return that value,
instead always returning EPERM. This meant that it was possible
to "probe" the pid space for processes that were not otherwise
visible. This change reverts that reversion.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 96244 09-May-2002 mini

Remove trace_req().

Reviewed by: alfred, jhb, peter


# 96215 08-May-2002 alfred

expand_name fixes:

.) don't use MAXPATHLEN + 1, fix logic to compensate.
.) style(9) function parameters.
.) fix line wrapping.
.) remove duplicated error and string handling code.
.) don't NUL terminate already NUL terminated string.
.) all string length variables changed from int to size_t.
.) constify variables.
.) catch when corename would be truncated.
.) cast pid_t and uid_t args for format string.
.) add parens around return arguments.

Help and suggestions from: bde


# 96190 07-May-2002 alfred

M_ZERO the temp buffer in expand_name() otherwise if an error occurs
while logging we may pass a non NUL terminated string to log(9) for a
%s format arg.


# 96054 05-May-2002 bde

Return the correct error code (ENOSYS, not EINVAL) from nosys(). Getting
killed by SIGSYS for unimlemented syscalls is bad enough.

Obtained from: Lite2 branch

The Lite2 branch has some other interesting unmerged (?) bits in this
file. They are well hidden among cosmetic regressions.


# 95936 02-May-2002 jhb

- Reorder execve() so that it performs blocking operations before it
locks the process.
- Defer other blocking operations such as vrele()'s until after we
release locks.
- execsigs() now requires the proc lock to be held when it is called
rather than locking the process internally.


# 95883 01-May-2002 alfred

Redo the sigio locking.

Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.


# 95594 27-Apr-2002 iedowse

Avoid the user-visible effect of setting SA_NOCLDWAIT when the
SIGCHLD handler is SIG_IGN. This is a reimplementation of the
problematic revision 1.131 of kern_exit.c. To avoid accessing process
UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN
and use that instead.


# 94861 16-Apr-2002 jhb

Lock proctree_lock instead of pgrpsess_lock.


# 94627 13-Apr-2002 jhb

- Change killpg1()'s first argument to be a thread instead of a process so
we can use td_ucred.
- In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB
or not. We don't need sched_lock.
- Close some races in psignal(). In psignal() there is a big switch
statement based on p_stat. All the different cases are assuming that
the process (or thread) isn't going to change state out from under it.
To ensure this is true, just lock sched_lock for the entire switch. We
practically held it the entire time already anyways. This also
simplifies the locking somewhat and actually results in fewer lock
operations.
- Allow signotify() to be called with the sched_lock held since psignal()
now does that.
- Use td_ucred in a couple of places.


# 93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


# 93786 04-Apr-2002 bde

Optimized the check for unmasked pending signals in CURSIG() using a new
inline function sigsetmasked() and a new macro SIGPENDING(). CURSIG()
will soon be moved out of the normal path of execution for syscalls and
traps. Then its efficiency will be less important but the new interfaces
will be useful for checking for unmasked pending signals in more places.

Submitted by: luoqi (long ago, in a slightly different form)

Assert that sched_lock is not held in CURSIG().


# 93076 24-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


# 92723 19-Mar-2002 alfred

Remove __P.


# 91291 26-Feb-2002 phk

Fix warning in !SMP case.

Submitted by: Maxime Henrion <mux@mu.org>


# 91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


# 90670 15-Feb-2002 bde

Fixed a typo in rev.1.65 that gave a reference to a nonexistent variable.
This was not detected by LINT because LINT is missing COMPAT_SUNOS.


# 90538 11-Feb-2002 julian

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 90487 10-Feb-2002 rwatson

Add a comment indicating that VOP_GETATTR() is called without appropriate
locking in the core dump code. This should be fixed.


# 90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 89172 09-Jan-2002 rwatson

o Revert kern_sig.c#1.143, as cr_cansignal() doesn't currently permit
a number of desirable cases in which SIGIO/SIGURG are delivered. We'll
keep tweaking.

Reported by: Alexander Kabaev <ak03@gte.com>


# 88944 05-Jan-2002 rwatson

- Teach SIGIO code to use cr_cansignal() instead of a custom CANSIGIO()
macro. As a result, mandatory signal delivery policies will be
applied consistently across the kernel.

- Note that this subtly changes the protection semantics, and we should
watch out for any resulting breakage. Previously, delivery of SIGIO
in this circumstance was limited to situations where the subject was
privileged, or where one of the subject's (ruid, euid) matched one
of the object's (ruid, euid). In the new scenario, subject (ruid, euid)
are matched against the object's (ruid, svuid), and the object uid's
must be a subset of the subject uid's. Likewise, jail now affects
delivery, and special handling for P_SUGID of the object is present.
This change can always be reversed or tweaked if it proves to disrupt
application behavior substantially.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# 87828 13-Dec-2001 rwatson

o Wording fix in comment.

Submitted by: tanimura via p4


# 85971 03-Nov-2001 peter

_SIG_MAXSIG (128) is the highest legal signal. The arrays are offset
by one - see _SIG_IDX(). Revert part of my mis-correction in kern_sig.c
(but signal 0 still has to be allowed) and fix _SIG_VALID() (it was
rejecting ignal 128).


# 85967 03-Nov-2001 peter

Partial reversion of rev 1.138. kill and killpg allow a signal
argument of 0. You cannot return EINVAL for signal 0. This broke
(in 5 minutes of testing) at least ssh-agent and screen.

However, there was a bug in the original code. Signal 128 is not
valid.

Pointy-hat to: des, jhb


# 85925 02-Nov-2001 des

We have a _SIG_VALID() macro, so use it instead of duplicating the test all
over the place. Also replace a printf() + panic() with a KASSERT().

Reviewed by: jhb


# 84622 07-Oct-2001 iedowse

Fix a typo in do_sigaction() where sa_sigaction and sa_handler were
confused. Since sa_sigaction and sa_handler alias each other in a
union, the bug was completely harmless. This had been fixed as part
of the SIGCHLD changes in revision 1.125, but it was reverted when
they were backed out in revision 1.126.


# 83952 25-Sep-2001 ps

Lock the vnode while truncating the corefile. This fixes a panic
with softupdates dangling deps.

Submitted by: peter
MFC: ASAP :)


# 83591 17-Sep-2001 julian

Replace line accidentally deleted during KSE additions.
Symptom.. Stopped program unable to be restarted if it was stopped
while already sleeping.


# 83525 15-Sep-2001 rwatson

o Correct authorization check in CANSIGIO(), which suffered from incorrect
transcription during the (pcred,ucred) merge; this was not used for
the kill() system call, so does not affect direct explicit process
signalling.

Pointed out by: fenner


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83222 08-Sep-2001 dillon

This brings in a Yahoo coredump patch from Paul, with additional mods by
me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that
if you have large process images core dumping, or you have a large number of
forked processes all core dumping at the same time, the original coredump code
would leave the vnode locked throughout. This can cause the directory vnode
to get locked up, which can cause the parent directory vnode to get locked
up, and so on all the way to the root node, locking the entire machine up
for extremely long periods of time.

This patch solves the problem in two ways. First it uses an advisory
non-blocking lock to abort multiple processes trying to core to the same
file. Second (my contribution) it chunks up the writes and uses bwillwrite()
to avoid holding the vnode locked while blocking in the buffer cache.

Submitted by: ps
Reviewed by: dillon
MFC after: 2 weeks


# 83163 06-Sep-2001 jhb

Call sendsig() with the proc lock held and return with it held.


# 82746 01-Sep-2001 dillon

Giant Pushdown

clock_gettime() clock_settime() nanosleep() settimeofday()
adjtime() getitimer() setitimer() __sysctl() ogetkerninfo()
sigaction() osigaction() sigpending() osigpending() osigvec()
osigblock() osigsetmask() sigsuspend() osigsuspend() osigstack()
sigaltstack() kill() okillpg() trapsignal() nosys()


# 82585 30-Aug-2001 dillon

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


# 82278 24-Aug-2001 roam

Prevent passing a null pointer as a filename to vn_open(),
if for some reason expand_name() failed to build a core file name.

PR: 29931
Submitted by: Foldi Tamas <crow@kapu.hu>
Reviewed by: dd, -arch
MFC after: 1 month


# 82025 21-Aug-2001 peter

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


# 80974 01-Aug-2001 peter

Temporarily back out kern_sig.c rev 1.125 and kern_exit.c rev 1.131.
This paniced my one of my machines one time too many :-( and there is
no sign of a solution in the pipeline. The deltas are still easily
available in cvs. The problem is that if the parent has been swapped
out, the child process cannot grope around in the parent's UPAGES to
see the sigact[] array or it will fault. This probably is a showstopper
for this implementation anyway.


# 80156 22-Jul-2001 dillon

As per further discussions on hackers redo the SIGCHLD patch to not generate
an unexpected user-visible side effect with the sigaction flags. Also cleanup
a minor union issue.

Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz>
MFC addendum: MFC will be combined w/ original commit
MFC after: 3 days


# 79125 03-Jul-2001 jhb

Grab Giant around postsig() since sendsig() can call into the vm to
grow the stack and we already needed Giant for KTRACE.


# 78635 22-Jun-2001 jhb

- Change CURSIG() and postsig() to require that the proc lock is held
rather than grabbing it and releasing it themselves. This allows callers
of these functions to get the lock to close race conditions.
- Grab Giant around ktrace in postsig.
- Count the switches performed on SIGSTOP's as involuntary context switches
in the resource usage stats.

Reported by: tegge (signal race), bde (missing csw stats)


# 78428 18-Jun-2001 jhb

Lock Giant in postsig() for the KTRACE case as ktrpsig() needs Giant when
it writes out to the trace file.

Reported by: peter, gallatin, and others


# 78056 11-Jun-2001 dwmalone

Try to make the setting of the SIGCHLD handler the same as setting of
the NOCLDWAI flag. Susv2 seems to require this.

Submitted by: Cejka Rudolf <cejkar@dcse.fee.vutbr.cz>
Reviewed by: dillon


# 77183 25-May-2001 rwatson

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


# 76646 15-May-2001 jhb

- Remove unneeded include of sys/ipl.h.
- Require the proc lock be held for killproc() to allow for the vmdaemon to
kill a process when memory is exhausted while holding the lock of the
process to kill.


# 76336 07-May-2001 knu

Properly copy the P_ALTSTACK flag in struct proc::p_flag to the child
process on fork(2).

It is the supposed behavior stated in the manpage of sigaction(2), and
Solaris, NetBSD and FreeBSD 3-STABLE correctly do so.

The previous fix against libc_r/uthread/uthread_fork.c fixed the
problem only for the programs linked with libc_r, so back it out and
fix fork(2) itself to help those not linked with libc_r as well.

PR: kern/26705
Submitted by: KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp>
Tested by: knu, GOTOU Yuuzou <gotoyuzo@notwork.org>,
and some other people
Not objected by: hackers
MFC in: 3 days


# 76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# 75893 23-Apr-2001 jhb

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# 75437 12-Apr-2001 rwatson

o Replace p_cankill() with p_cansignal(), remove wrappage of p_can()
from signal authorization checking.
o p_cansignal() takes three arguments: subject process, object process,
and signal number, unlike p_cankill(), which only took into account
the processes and not the signal number, improving the abstraction
such that CANSIGNAL() from kern_sig.c can now also be eliminated;
previously CANSIGNAL() special-cased the handling of SIGCONT based
on process session. privused is now deprecated.
o The new p_cansignal() further limits the set of signals that may
be delivered to processes with P_SUGID set, and restructures the
access control check to allow it to be extended more easily.
o These changes take into account work done by the OpenBSD Project,
as well as by Robert Watson and Thomas Moestl on the TrustedBSD
Project.

Obtained from: TrustedBSD Project


# 75104 02-Apr-2001 jhb

Change stop() to require the sched_lock as well as p's process lock to
avoid silly lock contention on sched_lock since in 2 out of the 3 places
that we call stop(), we get sched_lock right after calling it and we were
locking sched_lock inside of stop() anyways.


# 75091 02-Apr-2001 jhb

- Move the second stop() of process 'p' in issignal() to be after we send
SIGCHLD to our parent process. Otherwise, we could block while obtaining
the process lock for our parent process and switch out while we were
in SSTOP. Even worse, when we try to resume from the mutex being blocked
on our p_stat will be SRUN, not SSTOP.
- Fix a comment above stop() to indicate that it requires that the proc lock
be held, not a proctree lock.

Reported by: markm
Sleuthing by: jake


# 74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 74911 28-Mar-2001 jhb

- Resort some includes to deal with the new witness code coming in shortly.
- Make sure we have Giant locked before calling coredump() in sigexit().

Spotted by: peter (2)


# 73914 07-Mar-2001 jhb

- Proc locking. Most of signal handling is now MP safe and doesn't require
Giant. The only exception is the CANSIGNAL() macro. Unlocking the proc
lock around sendsig() in trapsignal() is also questionable. Note that
the functions sigexit(), psignal(), and issignal() must be called with
the proc lock of the process in question held. postsig() and
trapsignal() should not be called with the proc lock held, but they
also do not require Giant anymore either.
- Remove spl's that are now no longer needed as they are fully replaced.


# 72688 19-Feb-2001 bde

Fixed a longstanding latency bug in signal delivery. When a signal
is sent to a process, psignal() needs to schedule an AST for the
process if the process is runnable, not just if it is current, so that
pending signals get checked for on the next return of the process to
user mode. This wasn't practical until recently because the AST flag
was per-cpu so setting it for a non-current process would usually just
cause a bogus AST for the current process.

For non-current processes looping in user mode, it took accidental
(?) magic to deliver signals at all. Signals were usually delivered
late as a side effect of rescheduling (need_resched() sets astpending,
etc.). In pre-SMPng, delivery was delayed by at most 1 quantum (the
need_resched() call in roundrobin() is certain to occur within 1
quantum for looping processes). In -current, things are complicated
by normal interrupt handlers being threads. Missing handling of the
complications makes roundrobin() a bogus no-op, but preemptive
scheduling sort of works anyway due to even larger bogons elsewhere.


# 72376 11-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 71563 24-Jan-2001 jhb

- Proc locking.
- Catch up to proc flag changes.


# 71232 19-Jan-2001 jhb

Revert revision 1.102. I don't think p_nice needs to be protected with
sched_lock, and I'm fairly certain P_TRACED will be protected with the
proc lock instead.

Pointed out indirectly by: bde


# 71088 15-Jan-2001 jasone

Implement condition variables.


# 70698 05-Jan-2001 jhb

Protect p_nice and P_TRACED in psignal() above the switch statement with
sched_lock.


# 70603 02-Jan-2001 jhb

The previous commit wasn't entirely correct. At least one goto to the
out: label in psignal() did not grab sched_lock before trying to release
it. Also, the previous version had several cases where it grabbed
sched_lock before jumping to out: unneccessarily, so rework this a bit.
The runfast: and out: labels must be called with sched_lock released, and
the run: label must be called with it held. Appropriate mtx_assert()'s
have been added that should catch any bugs that may still be in this
code.

Noticed by: bde


# 70552 01-Jan-2001 jhb

Push down sched_lock in psignal(). sched_lock was being held across
recursive calls into psignal() as well as calls to signotify(),
forward_signal(), etc.


# 70550 01-Jan-2001 jhb

Add in a missing release of the proctree lock.

Submitted by: Sja <sakari.jalovaara@eqonline.fi>


# 70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


# 70104 16-Dec-2000 marcel

Fix a typo that allowed signals caused by traps to be delivered
to the process when said signal is masked.

PR: 23457
Submitted by: Yasuhiko Watanabe <yasu@mrit.mei.co.jp>


# 69947 12-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 69501 01-Dec-2000 jhb

Protect p_stat with sched_lock.


# 69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


# 69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 68862 17-Nov-2000 jake

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


# 68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


# 68520 09-Nov-2000 marcel

Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288

Reviewed by: mjacob
Suggested by: bde


# 67365 20-Oct-2000 jhb

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


# 65988 17-Sep-2000 bde

Unpessimized CURSIG(). The fast path through CURSIG() was broken in
the 128-bit sigset_t changes by moving conditionally (rarely) executed
code to the beginning where it is always executed, and since this code
now involves 3 128-bit operations, the pessimization was relatively
large. This change speeds up lmbench's pipe latency benchmark by
3.5%.

Fixed style bugs in CURSIG().


# 65987 17-Sep-2000 bde

Uninlined CURSIG() and unpolluted <sys/signalvar.h>. CURSIG() had become
very bloated, first with 128-bit sigset_t's, then with locking in the
SMP case, then with locking in all cases. The space bloat was probably
also time bloat, partly because the fast path through CURSIG() was
pessimized by the sigset_t changes. This change speeds up lmbench's
pipe-based latency benchmark by 4% on a Celeron. <sys/signalvar.h>
had become very polluted to support the bloat.


# 65687 10-Sep-2000 dfr

Move the include of <sys/systm.h> so that KTR gets a declaration for
snprintf().


# 65557 06-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 65237 30-Aug-2000 rwatson

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


# 65100 26-Aug-2000 marcel

Make this file compile again when COMPAT_43 has not been
defined. This boils down to conditionally compile the
old signal syscalls.

We might want to extend the types in syscalls.master to
make these syscalls conditionally on something more
appropriate than COMPAT_43.


# 62976 11-Jul-2000 mckusick

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


# 62550 04-Jul-2000 mckusick

Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from: BSD/OS


# 60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


# 59288 16-Apr-2000 jlemon

Introduce kqueue() and kevent(), a kernel event notification facility.


# 58941 02-Apr-2000 dillon

Make the sigprocmask() and geteuid() system calls MP SAFE. Expand
commentary for copyin/copyout to indicate that they are MP SAFE as
well.

Reviewed by: msmith


# 58755 28-Mar-2000 dillon

The SMP cleanup commit broke UP compiles. Make UP compiles work again.


# 58717 28-Mar-2000 dillon

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


# 58416 21-Mar-2000 ps

Add sysctl kern.coredump to enable/disable core dumps system wide.


# 54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


# 53518 21-Nov-1999 phk

Introduce the new function
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.

Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.

Replace procfs.h:CHECKIO() macros with calls to p_trespass().

Only show command lines to process which can trespass on the target
process.


# 53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


# 53212 16-Nov-1999 phk

This is a partial commit of the patch from PR 14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# 52680 30-Oct-1999 sef

Bail out of the process early if the coredumpfile limit is 0.

PR: kern/14540
Reviewed by: Nate Williams


# 52156 12-Oct-1999 marcel

Don't let osigaction and osigvec accept the new signal numbers.

Fix style bugs caused by the sigset_t in general while I'm here.

Submitted by: bde


# 52140 11-Oct-1999 luoqi

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


# 51791 29-Sep-1999 marcel

sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.


# 50754 01-Sep-1999 sef

Make prototype match function.


# 50717 31-Aug-1999 julian

General cleanup of core-dumping code.

Submitted by: Sean Fagan,


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50237 23-Aug-1999 cracauer

Fix a mistake in my last SA_SIGINFO commit. Processes could block
SIGKILL and SIGSTOP.

PR: kern/13293
Submitted by: dwmalone@maths.tcd.ie
Obtained from: PR had correct fix


# 49899 16-Aug-1999 billf

expand_name:
use pid_t and uid_t in the declaration as that is what we are passed
fix printf formatters accordingly.

Reviewed by: green


# 49791 14-Aug-1999 alfred

Fix potential overflow, remove unnecessary bzero.
Pointed out by: green

remove redundant strlen, sprintf returns the length.

Reviewed by: peter


# 48883 18-Jul-1999 peter

Reset SA_NOCLDWAIT on exec().

PR: kern/12669
Submitted by: Doug Ambrisko <ambrisko@whistle.com>


# 48621 06-Jul-1999 cracauer

Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more
than a review, this was a nice puzzle.

This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.

Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.

Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.

Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c

Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):

Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)

New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */

The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.

FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.

Reviewed by: BDE


# 46381 03-May-1999 billf

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


# 46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 43208 26-Jan-1999 julian

Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 42453 09-Jan-1999 eivind

KNFize, by bde.


# 42408 08-Jan-1999 eivind

Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by: msmith


# 41931 19-Dec-1998 julian

Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.


# 41446 01-Dec-1998 eivind

Check return value of malloc() in expand_name.

Reviewed by: sef


# 41086 11-Nov-1998 truckman

Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR: kern/7899
Reviewed by: bde, elvind


# 40550 21-Oct-1998 jdp

Eliminate a superfluous comment.


# 39200 14-Sep-1998 jdp

Remove includes that are no longer needed, now that the core dumping
code has been moved into the respective imgact_xxx.c sources.


# 39154 14-Sep-1998 jdp

Add provisions for variant core dump file formats, depending on the
object format of the executable being dumped. This is the first
step toward producing ELF core dumps in the proper format. I will
commit the code to generate the ELF core dumps Real Soon Now. In
the meantime, ELF executables won't dump core at all. That is
probably no less useful than dumping a.out-style core dumps as they
have done until now.

Submitted by: Alex <garbanzo@hooked.net> (with very minor changes by me)


# 37931 28-Jul-1998 joerg

Make the logging of abnormally exiting processes optional by a sysctl.
PR: kern/1711
Submitted by: Nick Sayer <nsayer@kfu.com>


# 37649 15-Jul-1998 bde

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


# 37496 08-Jul-1998 sef

Add support for run-time configuration of core file names. In a nutshell,
you can specify the corefile name by using:

sysctl -w kern.corefile="format"

where format is a pathname (relative or absolute -- default is "%N.core"),
with "%N" (process name), "%P" (process ID), and "%U" (user ID) formats.

Reviewed by: Mike Smith, with strong requests by Julian :)


# 37226 28-Jun-1998 dg

Added a sysctl variable kern.sugid_coredump for controlling coredump
behavior of setuid/setgid binaries that defaults to 0 (coredump disabled).


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 34020 03-Mar-1998 tegge

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


# 33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


# 33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


# 31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 31564 06-Dec-1997 sef

Changes to allow event-based process monitoring and control.


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 29340 13-Sep-1997 joerg

Implement SA_NOCLDWAIT.

The implementation is done (unlike what i've originally been
contemplating) by reparenting kids of processes that have the
appropriate bit set to PID 1, and let PID 1 handle the zombie. This
is far less problematical than what would seem to be ``doing it
right'', for a number of reasons.

Of our currently shipping PID-1-intended programs, 50 % fail the above
assumption. ;-) (Read this: sysinstall doesn't do it right. This is
no problem as long as no program called by sysinstall actually uses
SA_NOCLDWAIT.)

ToDo: . clarify the correct SA_* flag inheritance, compared
to other systems,
. decide whether the compat cruft (osigvec(9)) should
deal with new system additions or not,
. merge OpenBSD's SA_SIGINFO implementation. ;)
Reviewed by: bde


# 29041 02-Sep-1997 bde

Removed unused #includes.


# 28770 25-Aug-1997 bde

Finished staticizing.


# 24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 20028 29-Nov-1996 bde

Fixed sigaction() for SIGKILL and SIGSTOP. Reading the old action now
succeeds. Writing an action now succeeds iff the handler isn't changed.
(POSIX allows attempts to change the handler to be ignored or cause an
error. Changing other parts of the action is allowed (except attempts
to mask unmaskable signals are silently ignored as usual).)

Found by: NIST-PCTS


# 19027 18-Oct-1996 dg

Kill unnecessary test in coredump() that wasn't removed in rev 1.19
when the check for P_SUGID was added.


# 17045 09-Jul-1996 ache

Log not exited signal only, but the fact that core dumped (or not) too


# 15494 01-May-1996 bde

Removed unnecessary #includes from <sys/imgact.h> so that it is
self-sufficient and added explicit #includes where required.


# 14928 30-Mar-1996 peter

Correct the handling of NOCLDSTOP when using sigvec()
Make the SA_NODEFER handling more correct, previously if you called
sigaction to set a handler and had SA_NODEFER set, and manually masked
the signal itself in sa_mask, and when you read the settings back later,
you'd find SA_NODEFER incorrectly cleared.

Pointed out by: bde


# 14630 15-Mar-1996 peter

Actually implement SA_RESETHAND - some of the sigaction code recognised it
but didn't actually do anything with it (*blush*).

This should fix bde's test case where the test program set SA_RESETHAND
and when reading it back, it was gone.

Tweak/optimize SA_NODEFER so that the implementation is a little simpler
and does not incur (slight) overhead for every signal at delivery time.


# 14529 11-Mar-1996 hsu

From Lite2: proc LIST changes.
Reviewed by: david & bde


# 14504 11-Mar-1996 hsu

From Lite2: change code parameter to u_long and initialize ps_sig.
Reviewed by: davidg & bde


# 14331 02-Mar-1996 peter

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


# 13788 31-Jan-1996 dg

Improved killproc() log message and made it and the other similar message
tolerant of p_ucred being invalid. Starting using killproc() where
appropriate.


# 13203 03-Jan-1996 wollman

Converted two options over to the new scheme: USER_LDT and KTRACE.


# 12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12368 18-Nov-1995 bde

Cleaned up SA_NODEFER changes.

Added prototypes.


# 12221 12-Nov-1995 bde

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 11573 19-Oct-1995 swallace

Implement SA_NODEFER sa_flag for sigaction():
Add SA_NODEFER define to signal.h
Add ps_nodefer field to struct sigacts in signalvar.h.
Add code to kern_sig.c to handle SA_NODEFER.

If flag is set, when the signal is delivered, it is not masked automatically
from receiving the same signal again.

Reviewed by: wollman, bde


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 5999 28-Jan-1995 ats

Correct a name of one structure member in the sigaltstack structure.
Now it matches the man page and also the only other commercial implementation
i have found so far ( Solaris 2.x).
Changed the name from ss_base to ss_sp.


# 4210 06-Nov-1994 ache

Security nitpicking: don't make *.core world readable


# 3485 09-Oct-1994 phk

Cosmetics. related to getting prototypes into view.


# 3220 29-Sep-1994 ache

Log SA_CORE signals
Obtained from: FreeBSD 1.x


# 3098 25-Sep-1994 phk

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


# 2921 20-Sep-1994 bde

Don't use SIG_DFL or SIG_IGN for case label expressions. ANSI requires
such expressions to have integral type. "gcc -ansi -pedantic -W..."
fails to diagnose this constraint error.


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources