History log of /freebsd-11-stable/sys/kern/kern_shutdown.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 352719 25-Sep-2019 avg

MFC r351810: shutdown_halt: make sure that watchdog timer is stopped


# 344905 08-Mar-2019 jhb

MFC 340020: Don't enter DDB for fatal traps before panic by default.

Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'. Disable the new knob by default. Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob. However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.


# 331736 29-Mar-2018 kib

MFC r331375:
Do not send signals to init directly from shutdown_nice(9), do it from
the task context.


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 321418 24-Jul-2017 markj

MFC r320918, r321035:
Have mkdumpheader() handle version string truncation.


# 313119 03-Feb-2017 markj

MFC r312199:
Stop the scheduler upon panic even in non-SMP kernels.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302349 05-Jul-2016 glebius

Compile in the kassert_panic() function with INVARIANT_SUPPORT
option, not INVARIANTS. The function is required if we want
to load in a module that is compiled with INVARIANTS.

Reviewed by: jhb
Approved by: re (gjb)


# 301522 06-Jun-2016 bz

Implement a `show panic` command to DDB which will helpfully print the
panic string again if set, in case it scrolled out of the active
window. This avoids having to remember the symbol name.

Also add a show callout <addr> command to DDB in order to inspect
some struct callout fields in case of panics in the callout code.
This may help to see if there was memory corruption or to further
ease debugging problems.

Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Reviewed by: jhb (comment only on the show panic initally)
Differential Revision: https://reviews.freebsd.org/D4527


# 301040 31-May-2016 trasz

Cosmetics - add missing space after ellipses in shutdown messages.

MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# 298076 15-Apr-2016 cem

Add 4Kn kernel dump support

(And 4Kn minidump support, but only for amd64.)

Make sure all I/O to the dump device is of the native sector size. To
that end, we keep a native sector sized buffer associated with dump
devices (di->blockbuf) and use it to pad smaller objects as needed (e.g.
kerneldumpheader).

Add dump_write_pad() as a convenience API to dump smaller objects with
zero padding. (Rather than pull in NPM leftpad, we wrote our own.)

Savecore(1) has been updated to deal with these dumps. The format for
512-byte sector dumps should remain backwards compatible.

Minidumps for other architectures are left as an exercise for the
reader.

PR: 194279
Submitted by: ambrisko@
Reviewed by: cem (earlier version), rpokala
Tested by: rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5848


# 288446 01-Oct-2015 cperciva

Disable suspend when we're shutting down. This solves the "tell FreeBSD
to shut down; close laptop lid" scenario which otherwise tended to end
with a laptop overheating or the battery dying.

The implementation uses a new sysctl, kern.suspend_blocked; init(8) sets
this while rc.suspend runs, and the ACPI sleep code ignores requests while
the sysctl is set.

Discussed on: freebsd-acpi (35 emails)
MFC after: 1 week


# 287964 18-Sep-2015 trasz

Kernel part of reroot support - a way to change rootfs without reboot.

Note that the mountlist manipulations are somewhat fragile, and not very
pretty. The reason for this is to avoid changing vfs_mountroot(), which
is (obviously) rather mission-critical, but not very well documented,
and thus hard to test properly. It might be possible to rework it to use
its own simple root mount mechanism instead of vfs_mountroot().

Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2698


# 285993 29-Jul-2015 jeff

- Make 'struct buf *buf' private to vfs_bio.c. Having a global variable
'buf' is inconvenient and has lead me to some irritating to discover
bugs over the years. It also makes it more challenging to refactor
the buf allocation system.
- Move swbuf and declare it as an extern in vfs_bio.c. This is still
not perfect but better than it was before.
- Eliminate the unused ffs function that relied on knowledge of the buf
array.
- Move the shutdown code that iterates over the buf array into vfs_bio.c.

Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


# 283115 19-May-2015 asomers

Properly null-terminate strings in a kernel dump header. A version string
longer than 192 bytes will cause the version field of a dump header to
overflow. strncpy doesn't null terminate it, so savecore will print a
corrupted info file. Using strlcpy fixes the bug.

Differential Revision: https://reviews.freebsd.org/D2560
Reviewed by: markj
MFC after: 3 weeks
Sponsored by: Spectra Logic


# 282332 01-May-2015 markj

Remove a stale reference to the stop_scheduler_on_panic tunable, which
itself was removed in r243515.

MFC after: 1 week


# 281915 24-Apr-2015 markj

Make vpanic() externally visible so that it can be called as part of the
DTrace panic() action.

Differential Revision: https://reviews.freebsd.org/D2349
Reviewed by: avg
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division


# 276772 06-Jan-2015 markj

Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR: 193873
Differential Revision: https://reviews.freebsd.org/D904
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by: jhibbits (earlier version)
Sponsored by: EMC / Isilon Storage Division


# 274366 11-Nov-2014 pjd

Add missing privilege check when setting the dump device. Before that change it
was possible for a regular user to setup the dump device if he had write access
to the given device. In theory it is a security issue as user might get access
to kernel's memory after provoking kernel crash, but in practise it is not
recommended to give regular users direct access to storage devices.

Rework the code so that we do privileges check within the set_dumper() function
to avoid similar problems in the future.

Discussed with: secteam


# 269105 25-Jul-2014 gavin

Add error return to dumpsys(), and use it in doadump().

This commit does not add error returns to minidumpsys() or
textdump_dumpsys(); those can also be added later.

Submitted by: Conrad Meyer (EMC / Isilon storage division)


# 267992 28-Jun-2014 hselasky

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 267985 27-Jun-2014 gjb

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 267961 27-Jun-2014 hselasky

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 264240 07-Apr-2014 ed

Thinko: don't forget to apply 'howto' in case init(8) isn't running.


# 264237 07-Apr-2014 ed

Clean up shutdown_nice(). Just send the right signal to init(8).

Right now, init(8) cannot distinguish between an ACPI power button press
or a Ctrl+Alt+Del sequence on the keyboard. This is because
shutdown_nice() sends SIGINT to init(8) unconditionally, but later
modifies the arguments to reboot(2) to force a certain behaviour.

Instead of doing this, patch up the code to just forward the appropriate
signal to userspace. SIGUSR1 and SIGUSR2 can already be used to halt the
system.

While there, move waittime to the function where it's used; kern_reboot().


# 258956 05-Dec-2013 cperciva

Make panic_reboot_wait_time static.

Submitted by: jhb


# 258893 03-Dec-2013 cperciva

Add a new sysctl / loader tunable kern.panic_reboot_wait_time which
defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config
setting). Use this now-variable value in place of the defined constant
to control how long the system waits after a panic before rebooting.


# 248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 244105 10-Dec-2012 alfred

Switch the hardwired WITNESS panics to kassert_panic.

This is an ongoing effort to provide runtime debug information
useful in the field that does not panic existing installations.

This gives us the flexibility needed when shipping images to a
potentially large audience with WITNESS enabled without worrying
about formerly non-fatal LORs hurting a release.

Sponsored by: iXsystems


# 244099 10-Dec-2012 alfred

allow KASSERT to enter KDB.


# 243980 07-Dec-2012 alfred

Allow KASSERT to log instead of panic.

This is to allow debug images to be used without taking down the
system when non-fatal asserts are hit.

The following sysctls are added:

debug.kassert.warn_only: 1 = log, 0 = panic

debug.kassert.do_ktr: set to a ktr mask for logging via KTR

debug.kassert.do_log: 1 = log, 0 = quiet

debug.kassert.warnings: stats, number of kasserts hit

debug.kassert.log_panic_at:
number of kasserts before we actually panic, 0 = never

debug.kassert.log_pps_limit: pps limit for log messages

debug.kassert.log_mute_at: stop warning after N kasserts, 0 = never stop

debug.kassert.kassert: set this sysctl to trigger a kassert

Discussed with: scottl, gnn, marcel
Sponsored by: iXsystems


# 243515 25-Nov-2012 avg

remove stop_scheduler_on_panic knob

There has not been any complaints about the default behavior, so there
is no need to keep a knob that enables the worse alternative.

Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu
spinlock-like logic can be dropped, because only a single CPU is
supposed to win stop_cpus_hard(other_cpus) race and proceed past that
call.

MFC after: 1 month


# 242489 02-Nov-2012 alfred

Merge 242488, better use of strlcpy.

Submitted by: Eric van Gyzen <eric@vangyzen.net>


# 242439 01-Nov-2012 alfred

Provide a device name in the sysctl tree for programs to query the
state of crashdump target devices.

This will be used to add a "-l" (ell) flag to dumpon(8) to list the
currently configured dumpdev.

Reviewed by: phk


# 236503 03-Jun-2012 avg

free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG

Those calls are useful with hardware watchdog drivers too.

MFC after: 3 weeks


# 235777 22-May-2012 harti

Make dumptid non-static. It is used by libkvm to detect whether
this is a VNET-kernel or not. gcc used to put the static symbol into
the symbol table, clang does not. This fixes the 'netstat: no namelist'
error seen on clang+VNET systems.


# 230643 28-Jan-2012 attilio

Avoid to check the same cache line/variable from all the locking
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.

STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.

In collabouration with: flo
Reviewed by: avg
MFC after: 3 months (or never)
X-MFC: r228424


# 229854 09-Jan-2012 avg

enable stop_scheduler_on_panic by default

My plan is to make this behavior unconditional before 10.0 release.

X-MFC after: r228424 (if ever)


# 228632 17-Dec-2011 avg

introduce cngrab/cnungrab stub calls in some places where they make sense

MFC after: 2 months


# 228487 14-Dec-2011 obrien

Match other formatting.


# 228475 13-Dec-2011 obrien

Disallow various debug.kdb sysctl's when securelevel is raised.

PR: 161350


# 228449 12-Dec-2011 eadler

Document a large number of currently undocumented sysctls. While here
fix some style(9) issues and reduce redundancy.

PR: kern/155491
PR: kern/155490
PR: kern/155489
Submitted by: Galimov Albert <wtfcrap@mail.ru>
Approved by: bde
Reviewed by: jhb
MFC after: 1 week


# 228424 11-Dec-2011 avg

panic: add a switch and infrastructure for stopping other CPUs in SMP case

Historical behavior of letting other CPUs merily go on is a default for
time being. The new behavior can be switched on via
kern.stop_scheduler_on_panic tunable and sysctl.

Stopping of the CPUs has (at least) the following benefits:
- more of the system state at panic time is preserved intact
- threads and interrupts do not interfere with dumping of the system
state

Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
is set. That thread might call code that is also used in normal context
and that code might use locks to prevent concurrent execution of certain
parts. Those locks might be held by the stopped threads and would never
be released. To work around this issue, it was decided that instead of
explicit checks for panic context, we would rather put those checks
inside the locking primitives.

This change has substantial portions written and re-written by attilio
and kib at various times. Other changes are heavily based on the ideas
and patches submitted by jhb and mdf. bde has provided many insights
into the details and history of the current code.

The new behavior may cause problems for systems that use a USB keyboard
for interfacing with system console. This is because of some unusual
locking patterns in the ukbd code which have to be used because on one
hand ukbd is below syscons, but on the other hand it has to interface
with other usb code that uses regular mutexes/Giant for its concurrency
protection. Dumping to USB-connected disks may also be affected.

PR: amd64/139614 (at least)
In cooperation with: attilio, jhb, kib, mdf
Discussed with: arch@, bde
Tested by: Eugene Grosbein <eugen@grosbein.net>,
gnn,
Steven Hartland <killing@multiplay.co.uk>,
glebius,
Andrew Boyer <aboyer@averesystems.com>
(various versions of the patch)
MFC after: 3 months (or never)


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 225516 12-Sep-2011 attilio

dump_write() returns ENXIO if the dump is trying to be written outside
of the device boundry.
While this is generally ok, the problem is that all the consumers
handle similar cases (and expect to catch) ENOSPC for this (for a
reference look at minidumpsys() and dumpsys() constructions). That
ends up in consumers not recognizing the issue and amd64 failing to
retry if the number of pages grows up during minidump.
Fix this by returning ENOSPC in dump_write() and while here add some
more diagnostic on involved values.

Sponsored by: Sandvine Incorporated
In collabouration with: emaste
Approved by: re (kib)
MFC after: 10 days


# 225448 08-Sep-2011 attilio

Improve the informations reported in case of busy buffers during the shutdown:
- Axe out the SHOW_BUSYBUFS option and uses a tunable for selectively
enable/disable it, which is defaulted for not printing anything (0
value) but can be changed for printing (1 value) and be verbose (2
value)
- Improves the informations outputed: right now, there is no track of
the actual struct buf object or vnode which are referenced by the
shutdown process, but it is printed the related struct bufobj object
which is not really helpful
- Add more verbosity about the state of the struct buf lock and the
vnode informations, with the latter to be activated separately by the
sysctl

Sponsored by: Sandvine Incorporated
Reviewed by: emaste, kib
Approved by: re (ksmith)
MFC after: 10 days


# 224307 25-Jul-2011 avg

remove RESTARTABLE_PANICS option

This is done per request/suggestion from John Baldwin
who introduced the option. Trying to resume normal
system operation after a panic is very unpredictable
and dangerous. It will become even more dangerous
when we allow a thread in panic(9) to penetrate all
lock contexts.
I understand that the only purpose of this option was
for testing scenarios potentially resulting in panic.

Suggested by: jhb
Reviewed by: attilio, jhb
X-MFC-After: never
Approved by: re (kib)


# 222865 08-Jun-2011 attilio

In the current code, a double panic condition may lead to dumps
interleaving.
Signal dumping to happen only for the first panic which should be the
most important.

Sponsored by: Sandvine Incorporated
Submitted by: Nima Misaghian (nmisaghian AT sandvine DOT com)
MFC after: 2 weeks


# 222801 06-Jun-2011 marcel

Fix making kernel dumps from the debugger by creating a command
for it. Do not not expect a developer to call doadump(). Calling
doadump does not necessarily work when it's declared static. Nor
does it necessarily do what was intended in the context of text
dumps. The dump command always creates a core dump.

Move printing of error messages from doadump to the dump command,
now that we don't have to worry about being called from DDB.


# 221173 28-Apr-2011 attilio

Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks


# 214279 24-Oct-2010 brucec

Mostly revert r203420, and add similar functionality into ada(4) since the
existing code caused problems with some SCSI controllers.

A new sysctl kern.cam.ada.spindown_shutdown has been added that controls
whether or not to spin-down disks when shutting down.
Spinning down the disks unloads/parks the heads - this is
much better than removing power when the disk is still
spinning because otherwise an Emergency Unload occurs which may cause damage
to the actuator.

PR: kern/140752
Submitted by: olli
Reviewed by: arundel
Discussed with: mav
MFC after: 2 weeks


# 214004 18-Oct-2010 marcel

Rename boot() to kern_reboot() and make it visible outside of
kern_shutdown.c. This makes it easier for emulators and other
parts of the kernel to initiate a reboot.


# 213648 09-Oct-2010 avg

panic_cpu variable should be volatile

This is to prevent caching of its value in a register when it is checked
and modified by multiple CPUs in parallel.
Also, move the variable into the scope of the only function that uses it.

Reviewed by: jhb
Hint from: mdf
MFC after: 1 week


# 213322 01-Oct-2010 avg

sysctls in kern_shutdown: add twin tunables

also make couple of sysctl-controlled variables static

Reviewed by: rwatson
MFC after: 1 week


# 206897 20-Apr-2010 attilio

Fix compilation in the !SMP case.
Keep the interrupts disabled in order to avoid preemption problems.

Reported by: tinderbox, b.f. <bf1783 at googlemail dot com>
MFC: 2 weeks
X-MFC: r206878


# 206878 19-Apr-2010 attilio

Fix a deadlock in the shutdown code:
When performing a smp_rendezvous() or more likely, on amd64 and i386,
a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx
spinlock held, busy-waiting for other CPUs to acknowledge the operation.
As long as CPUs are suspended (via cpu_reset()) between the active mask
read and IPI sending there can be a deadlock where the caller will wait
forever for a dead CPU to acknowledge the operation.
Please note that on CPU0 that is going to be someway heavier because of
the spinlocks being disabled earlier than quitting the machine.

Fix this bug by calling cpu_reset() with the smp_ipi_mtx held.
Note that it is very likely that a saner offline/online CPUs mechanism
will help heavilly in fixing similar cases as it is likely more bugs
of this type may arise in the future.

Reported by: rwatson
Discussed with: jhb
Tested by: rnoland, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
MFC: 2 weeks

Special deciation to: anyone who made possible to have 16-ways machines
in Netperf


# 203420 03-Feb-2010 mav

MFp4:
Make CAM to stop all attached devices on system shutdown.
It allows devices to park heads, reducing stress on power loss.
Add `kern.cam.power_down` tunable and sysctl to controll it.


# 198408 23-Oct-2009 jhb

Don't bother copying the name of a kproc or kthread out into a temporary
array just to pass that array to printf(). kproc and kthread names are
NUL-terminated and can be printed using printf() directly.

Reviewed by: bde


# 197071 10-Sep-2009 n_hibma

Add a comment on the consequences of reducing the poweroff delay


# 196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


# 194118 13-Jun-2009 jamie

Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by: bz (mentor)


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


# 190684 04-Apr-2009 marcel

PowerPC, meet kernel core dumps. The support is based
on a generic dumper that creates an ELF core file and
uses PMAP functions to scan and iterate over memory
chunks, as well as handle memory mappings used during
dumping.
the PMAP layer can choose to return physical memory
chunks or virtual memory chunks. For minidumps, the
chunks should be virtual.

The default MMU I/F implementation for the scan_md()
method returns NULL. Thus, when a PMAP implementation
does not implement the required methods, an empty
core file is created. Here, empty means having an ELF
header only.

Obtained from: Juniper Networks


# 185234 23-Nov-2008 dwmalone

It's possible that the dump device has gone away after it was
configured, change the message to let people know this is a
possibility. I've slightly changed the message from the one
submitted by Pekka to keep the printf on one line.

Submitted by: Pekka Savola <pekkas@netcore.fi>


# 183527 01-Oct-2008 peter

Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment. Textdump magic can be passed in.


# 183412 27-Sep-2008 kib

If the panic thread is preempted after setting panicstr but before
setting TDF_INPANIC then it will never be rescheduled again. Wrap
setting the panic condition with the critical section.

Noted and reviewed by: tegge
MFC after: 1 week


# 177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# 176788 04-Mar-2008 ru

Make it possible to continue working after calling doadump()
manually from debugger. (This got broken in rev. 1.122.)


# 175768 28-Jan-2008 ru

Add a wrapper function that bound checks writes to the dump device.


# 175486 19-Jan-2008 attilio

- Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)


# 174921 26-Dec-2007 rwatson

Add textdump(4) facility, which provides an alternative form of kernel
dump using mechanically generated/extracted debugging output rather than
a simple memory dump. Current sources of debugging output are:

- DDB output capture buffer, if there is captured output to save
- Kernel message buffer
- Kernel configuration, if included in kernel
- Kernel version string
- Panic message

Textdumps are stored in swap/dump partitions as with regular dumps, but
are laid out as ustar files in order to allow multiple parts to be stored
as a stream of sequentially written blocks. Blocks are written out in
reverse order, as the size of a textdump isn't known a priori. As with
regular dumps, they will be extracted using savecore(8).

One new DDB(4) command is added, "textdump", which accepts "set",
"unset", and "status" arguments. By default, normal kernel dumps are
generated unless "textdump set" is run in order to schedule a textdump.
It can be canceled using "textdump unset" to restore generation of a
normal kernel dump.

Several sysctls exist to configure aspects of textdumps;
debug.ddb.textdump.pending can be set to check whether a textdump is
pending, or set/unset in order to control whether the next kernel dump
will be a textdump from userspace.

While textdumps don't have to be generated as a result of a DDB script
run automatically as part of a kernel panic, this is a particular useful
way to use them, as instead of generating a complete memory dump, a
simple transcript of an automated DDB session can be captured using the
DDB output capture and textdump facilities. This can be used to
generate quite brief kernel bug reports rich in debugging information
but not dependent on kernel symbol tables or precisely synchronized
source code. Most textdumps I generate are less than 100k including
the full message buffer. Using textdumps with an interactive debugging
session is also useful, with capture being enabled/disabled in order to
record some but not all of the DDB session.

MFC after: 3 months


# 174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


# 173004 26-Oct-2007 julian

Introduce a way to make pure kernal threads.
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.

kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.

All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.

fix top to show kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.

make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'

man page fixes to follow.


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 172836 20-Oct-2007 julian

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 170307 04-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 167211 04-Mar-2007 rwatson

Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 157628 10-Apr-2006 pjd

On shutdown try to turn off all swap devices. This way GEOM providers are
properly closed on shutdown.

Requested by: ru
Reviewed by: alc
MFC after: 2 weeks


# 155383 06-Feb-2006 jeff

- Add the global 'rebooting' variable that is used to detect when
boot() has been called.

Sponsored by: Isilon Systems, Inc.
MFC After: 1 week


# 150472 22-Sep-2005 ups

Don't pretend to be thread0 when calling sync().
It confuses the lock manager since in some places thread0 is
then used for vnode locking while curthread is used for vnode unlocking.

Found by: Yahoo!
Reviewed by: ps@,jhb@
MFC after: 3 days


# 149875 08-Sep-2005 truckman

Add a new struct buf flag bit, B_PERSISTENT, and use it to tag
struct bufs that are persistently held by ext2fs. Ignore any buffers
with this flag in the code in boot() that counts "busy" and dirty
buffers and attempts to sync the dirty buffers, which is done before
attempting to unmount all the file systems during shutdown.

This fixes the problem caused by any ext2fs file systems that are
mounted at system shutdown time, which caused boot() to give up on
a non-zero number of buffers and skip the call to vfs_unmountall().
This left all the mounted file systems in a dirty state and caused
them to all require cleanup by fsck on reboot.

Move the two separate copies of the "busy" buffer test in boot()
to a separate function.

Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro.

Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with
this and previous flag changes.

PR: kern/56675, kern/85163
Tested by: "Matthias Andree" matthias.andree at gmx.de
Reviewed by: bde
MFC after: 3 days


# 144929 12-Apr-2005 jeff

- Remove unused include.


# 138217 30-Nov-2004 njl

Replace a printf with a KASSERT that we are indeed running on the BSP.


# 137375 08-Nov-2004 marcel

Bind to cpu0 for boot() processing on all platforms again.


# 137329 07-Nov-2004 njl

Add comments to clarify why we need to run shutdown code on the BSP, update
an old comment about boot() being MI, and note that splhigh() no longer
disables interrupts.


# 137266 05-Nov-2004 peter

Restrict the sched_bind to cpu 0 to i386 and amd64 for now. I forgot that
alpha still doesn't use logical cpu id's.


# 137263 05-Nov-2004 peter

Bind to cpu0 for boot() processing. (Note this is reboot, not startup)
This means we'll always call the event hooks, device_shutdown etc on the
BSP and theoretically means we can de-cruftify the cpu_reset_proxy stuff.


# 137186 04-Nov-2004 phk

Remove buf->b_dev field.


# 136115 04-Oct-2004 phk

Change the perfectly precise message
printf("No buffers busy after final sync");
to
printf("All buffers synced.");
in order to not leave the users wondering if there should be.


# 134649 02-Sep-2004 scottl

Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c). This
is a possible MT5 candidate.


# 134479 29-Aug-2004 des

Remove the HW_WDOG option; it serves no purpose.

MFC after: 3 days


# 134089 20-Aug-2004 jhb

Remove some dead code under a straggling APIC_IO #ifdef that I missed
back before 5.2.


# 133763 15-Aug-2004 truckman

Yet another tweak to the shutdown messages in boot():

Don't count busy buffers before the initial call to sync() and
don't skip the initial sync() if no busy buffers were called.
Always call sync() at least once if syncing is requested. This
defers the "Syncing disks, buffers remaining..." message until
after the initial sync() call and the first count of busy
buffers. This backs out changes in kern_shutdown 1.162.

Print a different message when there are no busy buffers after the
initial sync(), which is now the expected situation.

Print an additional message when syncing has completed successfully
in the unusual situation where the work of syncing was done by
boot().

Uppercase one message to make it consistent with all of the other
kernel shutdown messages.

Discussed with: bde (in a much earlier form, prior to 1.162)
Reviewed by: njl (in an earlier form)


# 133418 09-Aug-2004 njl

Skip the syncing disks loop if there are no dirty buffers. Remove a
variable used to flag the initial printf.

Submitted by: truckman (earlier version)


# 132866 29-Jul-2004 njl

Minor message cleanup.


# 132506 21-Jul-2004 rwatson

Don't sync the file system on panic by default. This seems to basically
work very infrequently, and often results in a compound panic which
confuses debugging; locking/SMP have made the layering violation (and
risks) of this more obvious over time.

Discussed with: green, bde, et al.


# 132413 19-Jul-2004 julian

You always spot the typos after you have committed.. Start sentence
with a Cap.


# 132412 19-Jul-2004 julian

Allow the user who calls doadump() from the kernel debugger
to not get a page fault if he has not defined a dump device.
Panic can often not do a dump as it can hang forever in some cases.
The original PR was for amd64 only. This is a generalised version of
that change.

PR: amd64/67712
Submitted by: wjw@withagen.nl <Willen Jan Withagen>


# 132197 15-Jul-2004 alfred

Cleanup shutdown output.


# 132177 15-Jul-2004 alfred

Tidy up system shutdown.


# 132171 15-Jul-2004 njl

Clean up the output on reboot by keeping completion messages on the same
line as the announcement. Someone should probably update the "buffers
remaining" message since we now no longer should have any buffers remaining
at that point.


# 131927 10-Jul-2004 marcel

Update for the KDB framework:
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Call kdb_backtrace() instead of db_print_backtrace() or backtrace().

kern_mutex.c:
o Replace checks for db_active with checks for kdb_active and make
them unconditional.

kern_shutdown.c:
o s/DDB_UNATTENDED/KDB_UNATTENDED/g
o s/DDB_TRACE/KDB_TRACE/g
o Save the TID of the thread doing the kernel dump so the debugger
knows which thread to select as the current when debugging the
kernel core file.
o Clear kdb_active instead of db_active and do so unconditionally.
o Remove backtrace() implementation.

kern_synch.c:
o Call kdb_reenter() instead of db_error().


# 131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


# 131473 02-Jul-2004 jhb

- Change mi_switch() and sched_switch() to accept an optional thread to
switch to. If a non-NULL thread pointer is passed in, then the CPU will
switch to that thread directly rather than calling choosethread() to pick
a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
requiring all callers of mi_switch() to know to do this if they can be
called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
the middle of the function prototypes and up above into their own
section.


# 130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 130164 06-Jun-2004 phk

Remove filename+line number from panic messages.


# 127911 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 124944 25-Jan-2004 jeff

- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL. Assert that one of these is set in mi_switch() and propery
adjust the rusage statistics. This is to simplify the large number of
users of this interface which were previously all required to adjust the
proper counter prior to calling mi_switch(). This also facilitates more
switch and locking optimizations.
- Change all callers of mi_switch() to pass the appropriate paramter and
remove direct references to the process statistics.


# 124732 19-Jan-2004 phk

Add linenumber and source filename to panic(9) output.

Ideally a traceback should be printed too, any takers ?


# 118990 16-Aug-2003 marcel

Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)


# 116398 15-Jun-2003 iedowse

Don't overwrite the static panicstr buffer for secondary and further
panics. Before revision 1.38, we used to just point panicstr at the
format string if panicstr was NULL, but since we now use a static
buffer for the formatted panic message, we have to be careful to
only write to it during the first panic.

Pointed out by: bde


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 113633 17-Apr-2003 jhb

Lock the sched_lock while setting TDF_INPANIC.


# 113581 16-Apr-2003 phk

Don't include <sys/disklabel.h>


# 110859 14-Feb-2003 alfred

style.


# 110778 12-Feb-2003 peter

Print "Stack backtrace:" right before dumping the backtrace. We cannot
expect end users to automatically recognize a stack trace for what it is.


# 110585 09-Feb-2003 jeff

- Update a printf format for b_flags.


# 108682 04-Jan-2003 phk

Introduce the
void backtrace(void);
function which will print a backtrace if DDB is in the kernel and an
explanation if not.

This is useful for recording backtraces in non-fatal circumstances and
does not require pollution with DDB #includes in the files where it
is used.

It would of course be nice to have a non-DDB dependent version too,
but since the meat of a backtrace is MD it is probably not worth it.


# 107036 18-Nov-2002 alfred

During shutdown explain what the numbers following the 'syncing
disks' message mean, specifically, 'buffers remaining...'.


# 106024 27-Oct-2002 rwatson

Hook up mac_check_system_reboot(), a MAC Framework entry point that
permits MAC modules to augment system security decisions regarding
the reboot() system call, if MAC is compiled into the kernel.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 105531 20-Oct-2002 tmm

Add kernel dump support, based on the ia64 version (which was committed
as sparc64/sparc64/dump_machdep.c a while back).
Other than ia64 (which uses ELF), sparc64 uses a homegrown format for
the dumps (headers are required because the physical address and size of
the tsb must be noted, and because physical memory may be discontiguous);
ELF would not offer any advantages here.

Reviewed by: jake


# 103647 19-Sep-2002 jhb

Add ability to dump stacktraces on kernel panics when DDB is compiled into
the kernel. By default this is turned off since otherwise it could scroll
valuable panic messages off of the screen. This option can be turned on
by the DDB_TRACE kernel option as well as the debug.trace_on_panic sysctl.

Also, fix the DDB_UNATTENDED option to use its own header instead of
abusing opt_ddb.h. This way turning that one option on or off doesn't
force you to recompile all of ddb.

Requested by: many (1), bde (2*)

* - I know bde prefers !abusing option headers in general but can't
remember if he as brought up this specific case.


# 101155 01-Aug-2002 jhb

Revert previous revision which was accidentally committed and has not been
tested yet.


# 101153 01-Aug-2002 jhb

If we fail to write to a vnode during a ktrace write, then we drop all
other references to that vnode as a trace vnode in other processes as well
as in any pending requests on the todo list. Thus, it is possible for a
ktrace request structure to have a NULL ktr_vp when it is destroyed in
ktr_freerequest(). We shouldn't call vrele() on the vnode in that case.

Reported by: bde


# 100209 17-Jul-2002 gallatin

Allow alphas to do crashdumps: Refuse to run anything in choosethread()
after a panic which is not an interrupt thread, or the thread which
caused the panic. Also, remove panicstr checks from msleep() and from
cv_wait() in order to allow threads to go to sleep and yeild the cpu
to the panicing thread, or to an interrupt thread which might
be doing the crashdump.

Reviewed by: jhb (and it was mostly his idea too)


# 99828 11-Jul-2002 jhb

Add a missing newline during panic printf's for SMP systems that don't
have APICS. (Like all the !i386 archs).


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 96468 12-May-2002 marcel

Fix alpha build. The alpha has dumpsys implemented.
While here, revert the condition to list the machines
for which dumpsys has not been implemented.

Reported by: wilko


# 94169 08-Apr-2002 phk

Put back dumppcb, but this time we put a comment to tell what it is for.

Brucifixion by: bde


# 93935 06-Apr-2002 nyan

Added the new kernel dumping support for pc98.


# 93650 02-Apr-2002 marcel

Don't compile the dummy dumpsys for ia64.


# 93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 93579 01-Apr-2002 phk

Extend a hack to also hack around PC98's definition of __i386__


# 93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


# 93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


# 91778 07-Mar-2002 jake

Add needed includes of machine/smp.h, remove nested include in sys/smp.h
so that inlines in machine/smp.h can use variables declared in sys/smp.h.


# 90420 08-Feb-2002 julian

Replace accidentally removed setrunqueue()
solves problem with machines failing to sync in booting.
Submitted by: Tor.Egge@cvsup.no.freebsd.org


# 90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 89601 20-Jan-2002 sobomax

Allow dump device be configured as early as possible using loader(8) tunable.
This allows obtaining crash dumps from the panics occured during late stages
of kernel initialisation before system enters into single-user mode.

MFC after: 2 weeks


# 89522 18-Jan-2002 nik

Explain that the admin can safely power down the system as well as
rebooting.


# 88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# 86313 12-Nov-2001 ps

Fix a signed bug in the crashdump code for systems with > 2GB of ram.

Reviewed by: peter


# 85202 19-Oct-2001 peter

Add a sysctl for preventing the sync() in panic() recovery. This can
be so dangerous it isn't funny. eg: if you panic inside NFS or softdep,
and then try and sync you run into held locks and cause either deadlocks,
recursive panics or other interesting chaos. Default is unchanged.


# 83703 20-Sep-2001 peter

decrement the dumping variable after use so we can call it several times
if needed.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83312 10-Sep-2001 jhb

- Axe holding_giant as it is not used now anyways and was ok'd by
dillon in an earlier e-mail.
- We don't need to test the console right before we vfprintf() the panicstr
message. The printing of the panic message is a fine console test by
itself and doesn't make useful messages scroll off the screen or tick
developers off in quite the same.

Requested by: jlemon, imp, bmilekic, chris, gsutter, jake (2)


# 83126 05-Sep-2001 peter

Sigh. Dig up text from a signature in a 1994 Usenet post I made and redo
the ..uhh... ``console test'' to avoid another 50 emails about GPL issues.


# 82790 02-Sep-2001 peter

The !RESTARTABLE_PANICS code has some loose ends.


# 82749 01-Sep-2001 dillon

Giant Pushdown. Saved the worst P4 tree breakage for last.

reboot() getpriority() setpriority() rtprio() osetrlimit() ogetrlimit()
setrlimit() getrlimit() getrusage() getpid() getppid() getpgrp()
getpgid() getsid() getgid() getegid() getgroups() setsid() setpgid()
setuid() seteuid() setgid() setegid() setgroups() setreuid() setregid()
setresuid() setresgid() getresuid() getresgid () __setugid() getlogin()
setlogin() modnext() modfnext() modstat() modfind() kldload() kldunload()
kldfind() kldnext() kldstat() kldfirstmod() kldsym() getdtablesize()
dup2() dup() fcntl() close() ofstat() fstat() nfsstat() fpathconf()
flock()


# 82223 23-Aug-2001 jhb

Add a new kernel option RESTARTABLE_PANICS. If this option is present,
then one can restart from a panic by resetting the panicstr variable to
NULL. This commit conditionalizes the previously committed functionality
on this variable. It also removes the __dead2 attribute from the panic()
function so that when one continues from a panic() the behavior will
be predictable.


# 82119 21-Aug-2001 jhb

Clear db_active in boot() so that one can call the boot function (as well
as use the panic command) w/o having to manually clear db_active first
to avoid the db_error() in mi_switch().


# 82115 21-Aug-2001 jhb

Allow one to restart from a panic in DDB by clearing the panicstr
variable to NULL. Note that since panic() is marked with __dead2, this
has somewhat unpredictable results at best.


# 81688 15-Aug-2001 bde

Don't dump on the label sector or below. This avoids clobbering the
label if the dump device overflaps the label (which is a slight
misconfiguration). Dump routines don't use dscheck(), so the normal
write protection of the label doesn't help.

Reduced some nearby overflow bugs. In disk_dumpcheck(), there was
(fatal but fail-safe) overflow on i386's with 4GB of memory, at least
if Maxmem was the top page (can this happen?). The fix assumes that
the sector size divides PAGE_SIZE (dump routines already assume this).
In setdumpdev(), the corresponding overflow occurred with only about
2GB of memory on all machines with 32-bit ints. This allowed setdumpdev()
to succeed when it shouldn't have, but then disk_dumpcheck() failed
safe later. Except in old versions of FreeBSD like RELENG_3 where
there is no disk_dumpcheck().

PR: 28164 (label clobbering part)
MFC after: 1 week


# 78767 25-Jun-2001 jhb

- Sort includes.
- Count the context switches during shutdown when we give ithreads a chance
to run as volutary context switches.

Submitted by: bde (2)


# 76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


# 76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# 75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


# 75570 17-Apr-2001 jhb

Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.


# 74890 27-Mar-2001 ps

Last commit was broken.. It always prints '[CTRL-C to abort]'.
Move duplicate code for printing the status of the dump and checking
for abort into a separate function.

Pointy hat to: me


# 73913 07-Mar-2001 jhb

Lock initproc when we send SIGINT to init during shutdown.


# 72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


# 71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


# 70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


# 70063 15-Dec-2000 jhb

Stick the kthread API in a kthread_* namespace, and the specialized kproc
functions in a kproc_* namespace.

Reviewed by: -arch


# 69335 28-Nov-2000 jhb

Only print out APIC info on an SMP system during a panic if APIC_IO is
defined.


# 68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


# 67365 20-Oct-2000 jhb

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


# 67164 15-Oct-2000 phk

Remove unneeded #include <machine/clock.h>


# 67095 13-Oct-2000 peter

savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().


# 67093 13-Oct-2000 ps

Do not allocate a callout for all crashdumps, not just when you panic.


# 65980 17-Sep-2000 bde

Added used include of <sys/mutex.h> (don't depend on pollution in
<sys/signalvar.h>).


# 65764 11-Sep-2000 jhb

Fix some printf format string warnings due to sizeof(int) != sizeof(long) on
the alpha.


# 65707 10-Sep-2000 jasone

Allow interrupt threads to run during shutdown. This should fix the
"dirty buffers during shutdown" problem introduced by the SMPng commit.

Submitted by: tegge, cg


# 65557 06-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 65395 03-Sep-2000 peter

kern_shutdown.c was more ANSI-C than K&R - remove the remnants of K&R
support with extreme prejudice.


# 65394 03-Sep-2000 peter

gcc knows that savectx() is potentially a setjmp style dual-return
function which may lead to stack lossage and clobbered variables.
This isn't the case here, but there is no way to tell gcc that.

Work around this in a kinda bizzare way, but it shuts gcc up.


# 65268 30-Aug-2000 msmith

Make it possible to pass boot()'s flags to shutdown_nice() so that the
kernel can instigate an orderly shutdown but still determine the form of
that shutdown. Make it possible eg. to cleanly shutdown and power off the
system under ACPI when the power button is pressed.


# 62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


# 62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


# 60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 58755 28-Mar-2000 dillon

The SMP cleanup commit broke UP compiles. Make UP compiles work again.


# 55862 12-Jan-2000 luoqi

Seconds to ticks conversion was done at the wrong place.


# 55539 07-Jan-2000 luoqi

Introduce a mechanism to suspend/resume system processes. Suspend syncer
and bufdaemon prior to disk sync during system shutdown.


# 54248 07-Dec-1999 msmith

Change the default poweroff delay from 0 to 5 seconds. This seems to be
adequate for the IDE disks that I have available for testing. Most seem
to wait between 1 and 3 seconds before flushing their caches.

Add the ability to override the delay at compile time via the
undocumented option POWEROFF_DELAY. The delay can still be set via
sysctl as it was originally implemented.


# 54233 06-Dec-1999 phk

I always forget to check before I reboot a system, and while it
boots I try in vain to remember which month or even year this system
was last booted in.

Print out the uptime before rebooting, and give people like me
less (or more as it may be) to think about while the systems boots.


# 53838 28-Nov-1999 phk

Convert dumpon to work on character devices instead of block devices.

NB: You may need to change your /etc/rc.conf!


# 53452 20-Nov-1999 phk

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


# 53023 08-Nov-1999 phk

A little bit of nitpicking in the 'syncing disks...' end of a shutdown.


# 52128 11-Oct-1999 peter

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# 50571 29-Aug-1999 phk

Remove unneeded "maj" variable.

Give up if we have already started dumping once before.

Print name of dumpdev.


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50253 23-Aug-1999 bde

Use devtoname() to print dev_t's instead of casting them to long or u_long
for misprinting in %lx format.


# 50107 21-Aug-1999 msmith

Implement a new generic mechanism for attaching handler functions to
events, in order to pave the way for removing a number of the ad-hoc
implementations currently in use.

Retire the at_shutdown family of functions and replace them with
new event handler lists.

Rework kern_shutdown.c to take greater advantage of the use of event
handlers.

Reviewed by: green


# 49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


# 49627 11-Aug-1999 alfred

When doing a dump, if ENODEV is returned explain what happened to the user,
"the device doesn't support a dump routine"

Only print "dump succeeded" when 0 is returned, instead of when an unexpected
error number is returned, print that error number.

Reviewed by: Eivind


# 49558 09-Aug-1999 phk

Merge the cons.c and cons.h to the best of my ability. alpha may or
may not compile, I can't test it.


# 48948 20-Jul-1999 green

Make a dev2budev() function, and use it. This refixes pstat (working, broken,
working, broken, working) and savecore (working, working, broken, working,
working).

Sorta Reviewed by: phk


# 48944 20-Jul-1999 green

dev2udev() returns a CDEV udev_t, but we use block io in savecore. Savecore
also gets the device by st_rdev, which is alright except for the fact that
the sysctl kern.dumpdev passed out a char device. This is a workaround.
Sorry for not committing the fix earlier, before people started having
problems.


# 48868 17-Jul-1999 phk

Centralize dumpdev handling.


# 48431 01-Jul-1999 peter

Fix a warning - the code is correct but gcc can't tell.


# 48225 26-Jun-1999 mckusick

Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.


# 47084 12-May-1999 peter

Try an fix a couple of dev_t/major/minor etc nits.


# 46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


# 46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


# 46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


# 46381 03-May-1999 billf

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


# 46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 43437 30-Jan-1999 msmith

An error in the last commit; the changes were submitted by, not reviewed by,
"D. Rock" <rock@cs.uni-sb.de>


# 43436 30-Jan-1999 msmith

Add a new sysctl node kern.shutdown, off which shutdown-related things
can be hung.

Add a tunable delay at the beginning of the SHUTDOWN_FINAL at_shutdown
queue, allowing time to settle before we launch into the list of things
that are expected to turn the system off.

Fix a bug in at_shutdown_pri() where the second insertion always put
the item in second position in the queue.

Reviewed by: "D. Rock" <rock@cs.uni-sb.de>


# 42135 28-Dec-1998 msmith

Improved DDB_UNATTENDED behaviour. From the submitter:

There's something that's been bugging me for a while, so I decided to fix it.
FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least
in my opinion. The behavior change is such that:

1. Nothing changes when debugger_on_panic != 0.
2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the
machine will reboot. Also, if a trap occurs, the machine will
panic and reboot, unlike how it broke to DDB before. HOWEVER,
a trap inside DDB will not cause a panic, allowing full use
of DDB without having to worry about the machine being stuck
at a DDB prompt if something goes wrong during the day.
Patches for this behavior follow my signature, and it would
be a boon to anyone (like me) who uses DDB_UNATTENDED, but
actually wants the machine to panic on a trap (otherwise,
what's the use, if the machine causes a fatal trap rather than
a true panic, of debugger_on_panic?). The changes cause no
adverse behavior, but do involve two symbols becoming global

Submitted by: Brian Feldman <green@unixhelp.org>


# 41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


# 41137 13-Nov-1998 msmith

Don't count non-local dirty buffers as outstanding when shutting down.
This avoids the fsck-on-reboot symptoms if you're shutting down with a
hung or unreachable NFS server mounted. Also remove non-local
filesystems from the mount list to prevent the system hanging when it tries
to unmount them (for the same reason).

Drew points out that there's a good argument for forcibly removing all
"non syncable" filesystems from the mount list (eg. NFS mounts, disks
that aren't responding, etc.) as this then allows you to sync and
cleanly unmount their parents. No such change is included in this
patch.

Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>


# 40751 30-Oct-1998 msmith

Add the ability to specify where on the at_shutdown queue a handler is
installed.

Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).

Submitted by: Some ideas from Bruce Walter <walter@fortean.com>


# 39522 20-Sep-1998 dt

Fix precedence bug, so that kernel dump works.


# 39237 15-Sep-1998 gibbs

Add a new at_shutdown queue, SHUTDOWN_FINAL. This queue is run at
splhigh() after any system dumps have completed. SHUTDOWN_POST_SYNC
isn't quite late enough for disk controllers.

Converted at_shutdown queues to use the queue(3) macros.


# 38874 06-Sep-1998 ache

Store formatted panic string in static buffer to make it available later
for savecore.
Previous code give only panic format to savecore


# 38490 23-Aug-1998 des

Don't check minor number of dump device at all.

Discussed-with: Jörg Wunsch


# 38443 19-Aug-1998 des

Include opt_devfs.h which defines SLICE, to make previous commit
meaningful.

Pointed out by: Luoqi Chen


# 38362 16-Aug-1998 des

Enable kernel dumps on SLICE systems.


# 37555 11-Jul-1998 bde

Fixed printf format errors.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 36135 17-May-1998 tegge

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


# 35974 12-May-1998 bde

Backed out previous commit. It is invalid to call d_ioctl() on
possibly non-open devices, and we don't want to restrict dumping
to swap devices anwyay. It is especially invalid to call d_ioctl()
in non-process context for panics. d_psize() can be called on
non-open devices, at least on non-SLICED ones that support d_dump(),
and setdumpdev() has depended on this for a long time although it
is probably wrong, but even d_psize() can't be called in non-process
context - that's why dumpsys() depends on previously computed values
although these values may be stale. The historical restriction to
devices with dkpart(dev) == SWAP_PART should go away.


# 35812 06-May-1998 julian

Add dump support to the DEVFS/slice code.
now we can actually catch our crashes :-)

Submitted by: Luoqi Chen <luoqi@chen.ml.org> (the man who's everywhere)


# 34266 08-Mar-1998 julian

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


# 33445 16-Feb-1998 eivind

Add HW_WDOG to LINT, and turn it into a new-style option.


# 31403 25-Nov-1997 julian

Shift a few SYSINT() calls around.
this results in a few functions becoming static, and
the SYSINITs being close to the code they are related to.
setting up the dump device is with dumpsys() and
kicking off the scheduler is with the scheduler.
Mounting root is with the code that does it.

Reviewed by: phk


# 31275 18-Nov-1997 bde

Get buffer stuff by #including <sys/buf.h> instead of <sys/vnode.h>.

Staticized boot().

Fixed a gratuitous ANSIism.


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 29128 05-Sep-1997 peter

Cosmetic adjustment for the trap/double fault/panic cpu id listing.
It now prints the apic id in hex rather than decimal.


# 29041 02-Sep-1997 bde

Removed unused #includes.


# 28976 31-Aug-1997 bde

Fixed options SHOW_BUSYBUFS and PANIC_REBOOT_WAIT_TIME which were broken
by incomplete cutting and pasting from machdep.c to kern_shutdown.c.

PR: 3953


# 28809 26-Aug-1997 peter

Correct some things I forgot about until it was too late with smp_active.
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.


# 28769 25-Aug-1997 bde

Fixed some formatting and style bugs.

Fixed a gratuitous ANSIism.


# 28000 08-Aug-1997 julian

Teach both disk drivers how to cope with a hardware watchdog
while dumping core.. I'm tired of getting 1/2 of a core-dump

conditional on -DHW_WDOG for now
this will migrate to 2.2 as that's where I need it.


# 27997 08-Aug-1997 julian

Use up 4 precious bytes to give the kernel a hook to
support hardware watchdogs. The actual functions would be supplied in an LKM
or a linked file, but they need to hang off something.


# 26812 22-Jun-1997 peter

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


# 26657 15-Jun-1997 wollman

When APM is configured, turn off the power when halting for good.


# 26100 24-May-1997 fsmp

Move the printing of "cpu#%d" to AFTER the general panic argument string.
When a panic occurs early in the SMP boot process 'cpunumber()' hangs,
causing the panic string to be lost. Now the system appears to hang
in 'breakpoint()', but at least the user sees the panic string before the
hang.


# 25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21776 16-Jan-1997 bde

Reduced #include spam in <sys/sysproto.h> and fixed things that depended
on it.

makesyscalls.sh:
This parsed $Id$. Fixed(?) to parse $FreeBSD$. The output is wrong when
the id is not expanded in the source file.

syscalls.master:
Fixed declaration of sigsuspend(). There are still some bogons and
spam involving sigset_t.
Use `struct foo *' instead of the equivalent `foo_t *' for some nfs and
lfs syscalls so that <sys/sysproto.h> doesn't depend on <sys/mount.h>.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 19274 30-Oct-1996 julian

Further improved version of hadling a HALT when there is no console.


# 19268 30-Oct-1996 julian

if there is no console, cngetc should act like getc and return -1

make callers aware of this in those cases where it can occur.


# 18290 14-Sep-1996 bde

Changed cncheckc() interface so that it is 8-bit clean - return -1
instead of 0 if there is no input.


# 18277 13-Sep-1996 bde

Don't use __dead in the kernel. It was an obfuscation for gcc >= 2.5
and a no-op for gcc >= 2.6.


# 18113 07-Sep-1996 sos

Fixed two small leftovers form PHK's mega devconf removal commit..


# 18084 06-Sep-1996 phk

Remove devconf, it never grew up to be of any use.


# 17834 26-Aug-1996 julian

Remove the old cleanup code as it is no longer used..
also fix two cases of = instead of ==
(cut+paste bug duplication)


# 17768 22-Aug-1996 julian

Some cleanups to the callout lists recently added.
note that at_shutdown has a new parameter to indicate When
during a shutdown the callout should be made. also
add a RB_POWEROFF flag to reboot "howto" parameter..
tells the reboot code in our at_shutdown module to turn off the UPS
and kill the power. bound to be useful eventually on laptops


# 17677 19-Aug-1996 julian

Collect all the functioons concerned with rebooting into one place
also add the at_shutdown callout list, and change the one user of
the present (broken) method (the vn driver) to use the new scheme.


# 17658 19-Aug-1996 julian

move all functions related to shutting down to one file
called kern_shutdown.c

note: I couldn't see anything machine dependant in the
functions boot() and dumpsys() which were in machdep.c
I have left a prototype for cpu_boot() which would go in
machdep.c, but I have nothing to put in it. Iexpect others will
let me know in no uncertain ways that this or that is machine dependant
and should be there, but I'll way for that to happen.. :)

I haven't actually taken the functions OUT of machdep
or anywhere else yet.. I'm checking in this file so others can have a look
at it and comment. SO PLEASE DO COMMENT!

I am also (in another checkin) addinf a man(9) page for the new
at_shotdown().. er freudian slip there.. at_shutdown() call
so have a look at that (and at_exit and at_fork as well)
and feed me comments..

I'll heck in the changes to make these (shutdown) changes active tomorrow
if no-one objects too strongly..