History log of /netbsd-current/sys/kern/init_sysctl.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.228 09-Sep-2023 christos

Move the initialization of the random hash for addresses earlier so that
it does not happen under a spin lock context (when it is first used).


Revision tags: netbsd-10-base bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.227 20-Sep-2020 skrll

KNF (sort #includes and remove duplicate sys/cpu.h)


# 1.226 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.225 22-Mar-2020 ad

Merge vfs_cache.c from the ad-namecache branch. With this the namecache
index becomes per-directory (initially, a red-black tree). The remaining
changes on the branch to namei()/getcwd() will be merged in the future.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2
# 1.224 18-Jan-2020 skrll

Use 4K pages on ARM_MMU_EXTENDED platforms (all armv[67] except RPI) by
creating a new pool l1ttpl for the userland L1 translation table which
needs to be 8KB and 8KB aligned.

Limit the pool to maxproc and add hooks to allow the sysctl changing of
maxproc to adjust the pool.

This comes at a 5% performance penalty for build.sh -j8 kernel on a
Tegra TK1.


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.223 02-Jan-2020 thorpej

branches: 1.223.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2; 1.214.4;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.227 20-Sep-2020 skrll

KNF (sort #includes and remove duplicate sys/cpu.h)


# 1.226 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.225 22-Mar-2020 ad

Merge vfs_cache.c from the ad-namecache branch. With this the namecache
index becomes per-directory (initially, a red-black tree). The remaining
changes on the branch to namei()/getcwd() will be merged in the future.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2
# 1.224 18-Jan-2020 skrll

Use 4K pages on ARM_MMU_EXTENDED platforms (all armv[67] except RPI) by
creating a new pool l1ttpl for the userland L1 translation table which
needs to be 8KB and 8KB aligned.

Limit the pool to maxproc and add hooks to allow the sysctl changing of
maxproc to adjust the pool.

This comes at a 5% performance penalty for build.sh -j8 kernel on a
Tegra TK1.


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.223 02-Jan-2020 thorpej

branches: 1.223.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2; 1.214.4;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.226 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.225 22-Mar-2020 ad

Merge vfs_cache.c from the ad-namecache branch. With this the namecache
index becomes per-directory (initially, a red-black tree). The remaining
changes on the branch to namei()/getcwd() will be merged in the future.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2
# 1.224 18-Jan-2020 skrll

Use 4K pages on ARM_MMU_EXTENDED platforms (all armv[67] except RPI) by
creating a new pool l1ttpl for the userland L1 translation table which
needs to be 8KB and 8KB aligned.

Limit the pool to maxproc and add hooks to allow the sysctl changing of
maxproc to adjust the pool.

This comes at a 5% performance penalty for build.sh -j8 kernel on a
Tegra TK1.


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.223 02-Jan-2020 thorpej

branches: 1.223.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2; 1.214.4;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-2-RELEASE netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.225 22-Mar-2020 ad

Merge vfs_cache.c from the ad-namecache branch. With this the namecache
index becomes per-directory (initially, a red-black tree). The remaining
changes on the branch to namei()/getcwd() will be merged in the future.


Revision tags: ad-namecache-base3 ad-namecache-base2
# 1.224 18-Jan-2020 skrll

Use 4K pages on ARM_MMU_EXTENDED platforms (all armv[67] except RPI) by
creating a new pool l1ttpl for the userland L1 translation table which
needs to be 8KB and 8KB aligned.

Limit the pool to maxproc and add hooks to allow the sysctl changing of
maxproc to adjust the pool.

This comes at a 5% performance penalty for build.sh -j8 kernel on a
Tegra TK1.


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.223 02-Jan-2020 thorpej

branches: 1.223.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2; 1.214.4;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.224 18-Jan-2020 skrll

Use 4K pages on ARM_MMU_EXTENDED platforms (all armv[67] except RPI) by
creating a new pool l1ttpl for the userland L1 translation table which
needs to be 8KB and 8KB aligned.

Limit the pool to maxproc and add hooks to allow the sysctl changing of
maxproc to adjust the pool.

This comes at a 5% performance penalty for build.sh -j8 kernel on a
Tegra TK1.


Revision tags: ad-namecache-base1 ad-namecache-base
# 1.223 02-Jan-2020 thorpej

branches: 1.223.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2; 1.214.4;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.223 02-Jan-2020 thorpej

- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2; 1.214.4;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-1-RELEASE netbsd-8-1-RC1 netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.222 15-Jan-2019 mrg

remove kern.panic_now -- crashme panic node replaces it.


Revision tags: pgoyette-compat-1226
# 1.221 05-Dec-2018 christos

As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI


# 1.220 03-Dec-2018 christos

Expose addresses depending on the KASLR setting (from mrg@). Restores the
status quo of exposing kernel addresses if there is no KASLR.


Revision tags: pgoyette-compat-1126
# 1.219 24-Nov-2018 maxv

Fix kernel pointer leaks in the kern.lwp sysctl.


Revision tags: pgoyette-compat-1020
# 1.218 05-Oct-2018 christos

Provide a sysctl kern.expose_address to expose kernel addresses in
sysctl structure returns for non-root. Defaults to off. Turning it
on will restore sockstat/fstat and friends for regular users.


Revision tags: pgoyette-compat-0930
# 1.217 16-Sep-2018 mrg

CTL_DEBUG_MAXID is only used to size a static array that the compiler
can do just fine itself. use the compiler and remove the define.


Revision tags: pgoyette-compat-0906
# 1.216 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


# 1.215 22-Aug-2018 msaitoh

- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.214 04-Feb-2018 maxv

branches: 1.214.2;
Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: netbsd-8-0-RELEASE netbsd-8-0-RC2 netbsd-8-0-RC1 tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.214 04-Feb-2018 maxv

Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're
not gonna go very far.


Revision tags: tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base
# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.213 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

branches: 1.211.2;
Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.212 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.211 31-May-2016 pgoyette

Add a new kern.messages sysctl to allow kernel message verbosity to be
altered after boot.

Fixes PR kern/46539 using patch submitted by Nat Sloss.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.210 09-Nov-2015 pgoyette

Whether or not the semaphore code is loaded as a module or built-in, its
sysctl data belongs with the module code. Move it from kern/init_sysctl.c
to kern/uipc_sem.c

While here, add a new sysctl variable kern.posix.semcnt (current count of
semaphores) to complement the existing kern.posix.semmax (maximum number
of semaphores).


Revision tags: nick-nhusb-base-20150921
# 1.209 25-Aug-2015 pooka

Move a bunch of sysctl nodes from init_sysctl (kitchen sink sysctl file)
to init_sysctl_base (only base kernel defs). Main motivation was to
fix sysconf(_SC_NPROCESSORS) for Rumprun. As reported by neeraj on irc,
it returned -1 before this fix, so we were doing imaginary computing.


# 1.208 07-Jul-2015 justin

Move hw.machine and hw.machine_arch sysctls to base so rump can use them

This allows uname(3) and uname(1) to work on rump kernels.


Revision tags: nick-nhusb-base-20150606
# 1.207 20-May-2015 pooka

group msgbuf sysctls with the msgbuf code
(init_sysctl.c -> subr_log.c)


# 1.206 13-May-2015 pgoyette

More preparation for modularizing the SYSVxxx options. Here we
change the kern.ipc.sysvxxx sysctls into dynamic values, so each
sub-component of SYSVxxx can declare its own availability.


# 1.205 22-Apr-2015 pooka

move clock sysctls from init_sysctl.c to kern_clock.c


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.204 03-Aug-2014 apb

branches: 1.204.4;
BUILDINFO part 2: expose sysctl kern.buildinfo


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.203 08-May-2014 hannken

Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15
# 1.202 24-Mar-2014 christos

branches: 1.202.2;
- create cpu_{g,s}etmodel() and hide cpu_model from direct access.


Revision tags: riastradh-drm2-base3
# 1.201 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.200 25-Feb-2014 justin

Add kern.{ostype,osrelease,osrevision,version} kern.domainname,
kern.rawpartition sysctl support to rump kernel.
Moved the sysctl support that is shared between rump and normal
kernels to init_sysctl_base.c as rump cannot use init_sysctl.c
in order to avoid code duplication. Agreed with pooka@.


# 1.199 17-Jan-2014 pooka

Put cprng sysctls into subr_cprng.c. Also, make sysctl_prng static
in subr_cprng and get rid of SYSCTL_PRIVATE namespace leak macro.

Fixes ping(8) when run against a standalone rump kernel due to appearance
of the kern.urandom sysctl node (in case someone was wondering ...)


# 1.198 14-Sep-2013 joerg

GC various arrays defined and used in kern_proc.c


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.197 18-Mar-2013 para

branches: 1.197.6;
calculate vnode cache size based on the resource it gets allocated from
this stops setting kern.maxvnodes to high so it exhausts available space in kmem

http://mail-index.netbsd.org/tech-kern/2013/03/08/msg015095.html


# 1.196 07-Mar-2013 matt

Add a kern.configname sysctl object.


# 1.195 21-Feb-2013 pgoyette

Move boottime50 and its associated sysctl into the compat module. As
noted on tech-kern. Should fix PR/47579.

OK christos@

Will request pull-up to 6.0 in a few days.


# 1.194 02-Feb-2013 matt

Make the inclusion of <sys/cprng.h> a private matter for sysctl. No reason
to expose the rest of the kernel to it.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.193 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


# 1.192 08-Oct-2012 pooka

put all kern socket sysctls in the same place


# 1.191 03-Oct-2012 mlelstv

Add sanity check to sysctl_kern_maxvnodes.


# 1.190 02-Jun-2012 dsl

branches: 1.190.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4
# 1.189 07-Apr-2012 christos

remove bogus check.


Revision tags: jmcneill-usbmp-base8 jmcneill-usbmp-base7
# 1.188 10-Mar-2012 joerg

P1003_1B_SEMAPHORE is no longer optional.


Revision tags: jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.187 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-0-5-RELEASE netbsd-6-0-4-RELEASE netbsd-6-0-3-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.186 17-Dec-2011 tls

branches: 1.186.2;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.185 20-Nov-2011 tls

branches: 1.185.2;
An undocumented behavior of the sysctl kern.arandom node used to allow
sucking up to 8192 bytes out of the kernel arc4random() generator at a
time. Supposedly some very old application code uses this to rekey
other instances of RC4 in userspace (a truly great idea). Reduce the
limit to 256 bytes -- and note that it will probably be reduced to
sizeof(int) in the future, since this node is so documented.


# 1.184 19-Nov-2011 tls

First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.183 30-Aug-2011 bouyer

branches: 1.183.2;
Add getlabelusesmbr(), as proposed in
http://mail-index.netbsd.org/tech-userlevel/2011/08/25/msg005404.html
This is used by disk tools such as disklabel(8) to dynamically decide is
the undelyling platform uses a disklabel-in-mbr-partition or not
(instead of using a compile-time list of ports).
getlabelusesmbr() reads the sysctl kern.labelusesmbr, takes its value from the
machdep #define LABELUSESMBR.
For evbmips, make LABELUSESMBR 1 if the platform uses pmon
as bootloader, and 0 (the previous value) otherwise.


# 1.182 23-Jul-2011 jym

When KERN_SA is not defined, kern.no_sa_support is a constant (1). So
add CTLFLAG_IMMEDIATE to flags. Make the macro block logically reversed so
it looks more natural when reading.

Reported by Peter Tworek on tech-kern@.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.181 24-May-2011 joerg

Add some needed __UNCONST


# 1.180 02-Apr-2011 rmind

vfs_drainvnodes: drop lwp argument, remove variable name in prototype.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.179 05-Feb-2011 christos

avoid code duplication.


# 1.178 28-Jan-2011 pooka

migrate compat32 handling with previous

pointed out by Lars Heidieker


# 1.177 28-Jan-2011 pooka

Move sysctl routines from init_sysctl.c to kern_descrip.c (for
descriptors) and kern_proc.c (for processes). This makes them
usable in a rump kernel, in case somebody was wondering.


# 1.176 22-Jan-2011 christos

Use the L_ flags instead of the P_ flags for lwps.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.175 01-Jul-2010 rmind

branches: 1.175.2; 1.175.4;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.


# 1.174 16-Jun-2010 pooka

Set kinfo_lwp to 0 before filling it so that if someone removes
variable assignments from here, kernel memory does not leak to
userspace.

Bug found, a little bit suprisingly, by the atf ps test which failed
due to the column width between the -o holdcnt column being too
wide due to the contents displayed being garbage.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.173 13-Feb-2010 yamt

branches: 1.173.2;
sysctl_doeproc: don't follow a possibly stale pointer.


Revision tags: uebayasi-xip-base
# 1.172 13-Jan-2010 pooka

branches: 1.172.2;
Minimize unnecessary differences in rump.


# 1.171 24-Dec-2009 elad

When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.


Revision tags: matt-premerge-20091211
# 1.170 12-Dec-2009 dsl

Report L_INMEM in the lwp info as well.


# 1.169 12-Dec-2009 dsl

Always set L_INMEM to maintain binary compatibility.


Revision tags: jym-xensuspend-nbase
# 1.168 21-Oct-2009 rmind

Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


# 1.167 16-Sep-2009 pooka

Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c). Further surgery may be needed down
the line.


Revision tags: yamt-nfs-mp-base8
# 1.166 11-Sep-2009 apb

Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.


Revision tags: yamt-nfs-mp-base7
# 1.165 16-Aug-2009 christos

provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.164 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


# 1.163 16-May-2009 yamt

sysctl_doeproc:
- simplify.
- KERN_PROC: fix possible stale proc pointer dereference.
- KERN_PROC: don't do copyout with proc_lock held.


Revision tags: yamt-nfs-mp-base4 jym-xensuspend-base
# 1.162 12-May-2009 yamt

don't forget to skip marker processes.


# 1.161 04-May-2009 yamt

sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.


Revision tags: yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.160 29-Mar-2009 mrg

- add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)


# 1.159 11-Mar-2009 mrg

like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.


# 1.158 11-Mar-2009 mrg

always calculate "needed" for KERN_FILE2 calls. this allows a caller
to get an estimate of the needed space, like the intention is.


# 1.157 08-Mar-2009 ad

Don't bother with file_t::f_iflags any more, as it's not used.
Noted by mrg@.


Revision tags: nick-hppapmap-base2
# 1.156 13-Feb-2009 apb

Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.


Revision tags: mjf-devfs2-base
# 1.155 17-Jan-2009 cegger

branches: 1.155.2;
whitespace nit


# 1.154 17-Jan-2009 yamt

malloc -> kmem_alloc.


# 1.153 11-Jan-2009 christos

merge christos-time_t


Revision tags: christos-time_t-nbase christos-time_t-base
# 1.152 29-Dec-2008 pooka

Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.151 28-Nov-2008 elad

PR/40002: Daniel Horecki: sockstat doesn't work for user with sysctl
security.curtain=1

If the kauth call failed, we'd silently continue the loop, but the error
code would remain and eventually "leak" to userspace. Reset the error to
zero when continuing.

Tested by snj@ and myself. Okay snj@.


# 1.150 12-Nov-2008 ad

Allow the POSIX semaphore code to be loaded as a module.


Revision tags: netbsd-5-base matt-mips64-base2
# 1.149 22-Oct-2008 ad

branches: 1.149.2; 1.149.4;
Set kern.posix_semaphores are runtime so it can be a module.
(Picked wrong header the last time.)


# 1.148 22-Oct-2008 ad

Set kern.posix_semaphores are runtime so it can be a module.


Revision tags: haad-dm-base1
# 1.147 19-Oct-2008 christos

rename proc_representative_lwp to proc_active_lwp and clarify it is for
ps display purposes. suggested by rmind.


# 1.146 19-Oct-2008 christos

Select a "representative" lwp instead of the first lwp in the list. The
first lwp in the list is the last created and in the firefox and gtk-gnash
case this is usually a zombie, so the status in ps was ZLl. This now picks
the lwp in order ONPROC > RUN > SLEEP > STOP > SUSPENDED > IDL > DEAD > ZOMB
and breaks ties using cpticks.


# 1.145 15-Oct-2008 wrstuden

Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 simonb-wapbl-nbase simonb-wapbl-base
# 1.144 15-Jul-2008 christos

make l_flags contain more stuff. Fixes top thread display where system processes
were always displayed.


# 1.143 02-Jul-2008 rmind

branches: 1.143.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.


Revision tags: wrstuden-revivesa-base-1 yamt-pf42-base4 wrstuden-revivesa-base
# 1.142 16-Jun-2008 ad

PR kern/38927: processes getting stuck in uvm_map (cv_timedwait), hanging
machine

Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.


# 1.141 16-Jun-2008 ad

- PPWAIT is need only be locked by proc_lock, so move it to proc::p_lflag.
- Remove a few needless lock acquires from exec/fork/exit.
- Sprinkle branch hints.

No functional change.


Revision tags: yamt-pf42-base3
# 1.140 31-May-2008 ad

branches: 1.140.2;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.


# 1.139 25-May-2008 christos

don't forget to fill in the emulation.


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.138 12-May-2008 ad

Use cpu_index(), not ci_cpuid.


# 1.137 30-Apr-2008 ad

branches: 1.137.2;
KERN_FILE_BYPID: fix locking botch.


# 1.136 29-Apr-2008 ad

Don't try grabbing a zombie's p_reflock.


# 1.135 29-Apr-2008 ad

PR kern/37917 /bin/ps no longer shows zombies


# 1.134 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.133 24-Apr-2008 ad

branches: 1.133.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.132 24-Apr-2008 ad

Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.131 05-Apr-2008 yamt

branches: 1.131.2;
- l_wmesg is not always valid. check l_wchan when using l_wmesg.
should fix a crash reported by Juan RP on current-users@.
- ttyinfo: lock lwp when accessing l_wmesg.
- fill_lwp: add an assertion.


# 1.130 04-Apr-2008 cegger

use device_xname() where appropriate
OK martin


# 1.129 02-Apr-2008 xtraeme

Revert rev 1.126-1.128. The original code was correct and rmind and I
didn't look correctly at them.


# 1.128 01-Apr-2008 xtraeme

When copying l_name and l_wmesg use KI_LNAMELEN and KI_WMESGLEN
respectively, so that we don't care if l_name/wmesg is longer
than kl_name/wmesg and the KASSERTs added in previous can go away.


# 1.127 01-Apr-2008 xtraeme

Fix previous: use the length of l->l_foo not kl->l_foo and add
two KASSERTs to check for max lenght limits before copying.

As suggested by rmind@.


# 1.126 01-Apr-2008 xtraeme

fill_lwp: when copying l_wmesg and l_name, use the size of the string
not of the variable.

Found and ok by rmind@.


# 1.125 27-Mar-2008 ad

branches: 1.125.2;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.124 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.123 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.122 30-Jan-2008 ad

branches: 1.122.2; 1.122.6;
Another locking botch.


# 1.121 28-Jan-2008 ad

More file/proc locking fixes.


Revision tags: bouyer-xeni386-nbase
# 1.120 23-Jan-2008 elad

Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.


Revision tags: bouyer-xeni386-base
# 1.119 12-Jan-2008 ad

sysctl_kern_proc_args: avoid zero length allocation.


Revision tags: matt-armv6-base
# 1.118 07-Jan-2008 ad

Patch up sysctl locking:

- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way


# 1.117 31-Dec-2007 ad

Remove systrace. Ok core@.


# 1.116 26-Dec-2007 christos

Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.


Revision tags: vmlocking2-base3
# 1.115 22-Dec-2007 yamt

use binuptime for l_stime/l_rtime.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.114 10-Dec-2007 elad

- Use KAUTH_ARG() instead of casts,
- Don't ignore return value of settime() in sysctl_kern_rtc_offset(), as
suggested by yamt@.

Note: the kauth(9) call in sysctl_kern_rtc_offset() is bogus, but this will
be addressed separately.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base jmcneill-pm-base reinoud-bufcleanup-base
# 1.113 06-Nov-2007 ad

branches: 1.113.2; 1.113.4; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


# 1.112 19-Oct-2007 ad

branches: 1.112.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h


Revision tags: yamt-x86pmap-base4
# 1.111 16-Oct-2007 christos

branches: 1.111.2;
Don't fail to produce the argument vector if the program has modified it
by deleting arguments. This is a popular practice, and failing means that
ps(1) prints (programname). For example this is what XtOpenDisplay() with
-geometry. This used to work before 2.0H, and the behavior is allowed and
hinted by POSIX. Found out by Anon Ymous.


# 1.110 16-Oct-2007 christos

- fix comment sentence capitalization.
- whitespace cleanup.
No functional changes.


# 1.109 15-Oct-2007 ad

Add _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF for sysconf(). These
are extensions but are provided by many Unix systems.


Revision tags: yamt-x86pmap-base3
# 1.108 13-Oct-2007 rmind

sysctl_kern_lwp: Use a correct variable when rechecking if LWP still
exists after relocking. Found via CID: 4689. OK by <dsl>.


Revision tags: vmlocking-base
# 1.107 08-Oct-2007 ad

Merge from vmlocking: don't hold scheduler locks across copyout().


Revision tags: yamt-x86pmap-base2
# 1.106 28-Sep-2007 joerg

Add kern.no_sa_support to easily detect whether a kernel supports
Scheduler Activation or not. This is a negative name as ld.so.conf
conditionals threat undefined sysctls like 0.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.105 15-Aug-2007 ad

branches: 1.105.2; 1.105.4;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.104 06-Aug-2007 yamt

branches: 1.104.2;
remove a homegrown definition of CPU_INFO_FOREACH.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.103 09-Jul-2007 ad

branches: 1.103.2; 1.103.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.102 30-Jun-2007 dsl

Add a flags parameter to kauth_cred_get/setgroups() so that sys_set/setgroups
can copy directly to/from userspace.
Avoids exposing the implementation of the group list as an array to code
outside kern_auth.c.
compat code and man page need updating.


# 1.101 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8
# 1.100 30-Apr-2007 dsl

Remove proc->p_ru and the 'rusage' pool.
I think it existed to cache the numbers in kernel memory of a zombie when
proc->p_stats was part of the 'u' area - so got freed earlier and wouldn't
(easily) be accessible from a separate process. However since both the
p_ru and p_stats fields are freed at the same time it is no longer needed.
Ride the recent 4.99.19 version change.


Revision tags: thorpej-atomic-base
# 1.99 11-Mar-2007 ad

branches: 1.99.2;
Add the LWP's runtime to kinfo_lwp.


# 1.98 09-Mar-2007 ad

branches: 1.98.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.


Revision tags: ad-audiomp-base
# 1.97 17-Feb-2007 pavel

Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.96 15-Feb-2007 ad

branches: 1.96.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.


Revision tags: post-newlock2-merge
# 1.95 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase newlock2-base
# 1.94 22-Jan-2007 elad

Don't rely on KAUTH_PROCESS_CANSEE for environment just yet,
otherwise we're allowing anyone to read the environment unless
curtain is enabled.

From yamt@.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 netbsd-4-base
# 1.93 27-Nov-2006 elad

branches: 1.93.2;
Move Veriexec's sysctl(9) setup routine and helper to kern_verifiedexec.c.


# 1.92 25-Nov-2006 christos

PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory


# 1.91 01-Nov-2006 christos

implement kern.arandom properly, instead of lying about it and only filling
the first 4 bytes of the array with random data.


# 1.90 29-Oct-2006 christos

add the emulation in kinfo_proc2


Revision tags: yamt-splraiseipl-base2
# 1.89 03-Oct-2006 elad

Back out previous (p_flag2).

In 30 minutes from now Jason Thorpe will come up with an implementation
of a proplib dictionary in struct proc, so adding an int doesn't really
make any sense.


# 1.88 03-Oct-2006 elad

Until we figure out the Perfect Way of adding flags to processes, add
a p_flag2. No objections on tech-kern@.

Input from simonb@, thanks!


# 1.87 24-Sep-2006 dogcow

correct dcopyout #define for !KTRACE case.


# 1.86 23-Sep-2006 manu

Add a -t+S flag to ktrace for tracing activity related to sysctl. MIB
names will be displayed, with data readen and written as well.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.85 13-Sep-2006 elad

branches: 1.85.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.84 10-Sep-2006 manu

When getting the program argument or environement string, we previously
assumed that all the strings were stored in a row, separated by NUL chars,
at the address pointed bu argv[0] (or envp[0]).

This was wrong: if the program changed argvs[0], we still read the
first string correctly, but the next strings did contain unexpected data.

The fix: read the whole argv (or envp) array, then copy the string one by
one, using their addresses in argv (or agrp)


# 1.83 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: rpaulo-netinet-merge-pcb-base
# 1.82 08-Sep-2006 manu

When colecting a 32 bit process' argument or environement vector, we need
to convert 32 bits pointers to the 64 bit environement


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.81 26-Jul-2006 dogcow

branches: 1.81.4;
at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.


# 1.80 25-Jul-2006 dogcow

mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.


# 1.79 24-Jul-2006 elad

some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.


# 1.78 23-Jul-2006 ad

Use the LWP cached credentials where sane.


# 1.77 17-Jul-2006 ad

- Don't cast kauth_cred_t to (struct ucred *), just set pc_ucred = NULL.
- Fill ucred::cr_ref.


# 1.76 16-Jul-2006 elad

CURTAIN() -> KAUTH_GENERIC_CANSEE.


# 1.75 14-Jul-2006 elad

move security.setid_core.* to kern.coredump.setid.*, as requested by yamt@.


Revision tags: yamt-pdpolicy-base6
# 1.74 21-Jun-2006 christos

Don't leak memory on success. Allocate only the type of struct that we'll
need for efficiency.


# 1.73 20-Jun-2006 christos

don't allocate too much stuff on the stack.


Revision tags: chap-midi-nbase chap-midi-base
# 1.72 17-Jun-2006 yamt

sysctl_security_setidcorename: don't allocate MAXPATHLEN bytes on stack.


Revision tags: gdamore-uart-base
# 1.71 13-Jun-2006 yamt

branches: 1.71.2;
remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.


# 1.70 13-Jun-2006 yamt

sysctl_kern_file, sysctl_kern_file2: don't abuse kauth_authorize_process
for non-process objects.


# 1.69 13-Jun-2006 yamt

sysctl_kern_file2: fix an indent.


Revision tags: yamt-pdpolicy-base5 simonb-timecounters-base
# 1.68 14-May-2006 elad

branches: 1.68.2;
integrate kauth.


Revision tags: elad-kernelauth-base
# 1.67 17-Apr-2006 elad

Move securelevel-specific stuff to its own file.


# 1.66 14-Apr-2006 blymn

Make i/o statistics collection more generic, include tape drives and
nfs mounts in the set of devices that statistics will be reported on.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3
# 1.65 01-Apr-2006 christos

PR/32809: Pavel Cahyna: Conflicting flags in l_flag and p_flag are causing
ps(1) to print incorrect information. Annotate the flags in the header files
to make sure that flags are not being re-used and move flags so that there
are no conflicts.


# 1.64 26-Mar-2006 erh

When DIAGNOSTIC is defined, provide a kern.panic_now sysctl to conviniently
and reliably panic the system


Revision tags: peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base
# 1.63 01-Mar-2006 yamt

branches: 1.63.2; 1.63.4; 1.63.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.


Revision tags: yamt-uio_vmspace-base5
# 1.62 04-Feb-2006 yamt

for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)


# 1.61 02-Feb-2006 elad

branches: 1.61.2;
implement a security.setid_core node as discussed on tech-kern@ and
tech-security@.


# 1.60 27-Jan-2006 elad

branches: 1.60.2;
remove security node sysctl objects; they are now created using CTL_CREATE.


# 1.59 26-Dec-2005 perry

branches: 1.59.2;
u_intN_t -> uintN_t


# 1.58 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.57 05-Dec-2005 christos

- make settime take timespec.
- avoid wrapping of time in settime.
- pass struct proc down so that we can log a detailed message.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.56 08-Oct-2005 yamt

sysctl_kern_proc_args: don't assume that the process is
resident while we are sleeping.


# 1.55 07-Sep-2005 elad

Implement curtain in KERN_{PROC,PROC2,FILE,FILE2,PROC_ARGS}.
While I'm here, disable curtain by default.


# 1.54 07-Sep-2005 elad

Introduce ``security.curtain'', new node for security features and
settings, and new variable for controlling access to objects based
on user-id.


# 1.53 06-Sep-2005 rpaulo

Implement kern.hardclock_ticks.


# 1.52 24-Aug-2005 simonb

Fix a tyop in a comment.


# 1.51 13-Aug-2005 blymn

Remove the tape stats from here, they caused issues on non-scsipi
architectures.


# 1.50 08-Aug-2005 blymn

Don't include tape stats functions if no devices configured.


# 1.49 07-Aug-2005 blymn

Add tape statistics gathering functions.


# 1.48 29-Jul-2005 elad

#ifdef VERIFIED_EXEC


# 1.47 16-Jul-2005 christos

defopt verified_exec.


# 1.46 17-Jun-2005 atatat

branches: 1.46.2;
Comment in new cp_id implementation was wrong since I abandoned
rewriting it in favor of some testing and then never got back to it.
It's better now.


# 1.45 16-Jun-2005 christos

Add a new sysctl 'cp_id' that returns the array of cpu id values. Requested by
me, implemented by atatat.


# 1.44 15-Jun-2005 elad

Fix sysctl handling for raise-only variables. This affected the veriexec
node entirely. Reported by Nino Dehne.


# 1.43 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.42 06-Jun-2005 jdc

Revert previous ('_ncpus' is now 'ncpus' again).
MI variable names have precedence.


# 1.41 05-Jun-2005 jdc

Rename 'ncpus' to '_ncpus', otherwise we shadow sparc/sparc64's 'ncpus'
when MULTIPROCESSOR is defined.


# 1.40 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.39 22-May-2005 elad

Add indication for number of fingerprinted files on each device.

When a table is created for a new device, a new variable is created
under the kern.veriexec.count node named "dev_<id>". For example,
dev_0, dev_3, etc.


# 1.38 19-May-2005 elad

Some changes in veriexec.

New features:

- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).

Bug fixes:

- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..


Revision tags: kent-audio2-base
# 1.37 18-Apr-2005 mrg

be explicit in the description for POSIX saved set-id that this is for
POSIX-style, not sane-style. (ie, add "POSIX " to the description.)


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.36 11-Mar-2005 atatat

branches: 1.36.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.


# 1.35 10-Mar-2005 atatat

Change types of kern.file2 and net.*.*.pcblist to NODE


# 1.34 09-Mar-2005 atatat

Add kern.file2. As kern.proc2 is to kern.proc, so is kern.file2 to
kern.file, namely a 32/64 bit clean sysctl interface to the same data.
It also borrows a few things from struct vnode (if applicable) and
from struct proc, just to tie things together a bit more.

You can walk this list "by file" or "by pid". The former method is
similar to kern.file but omits the filehead, and the latter can give
you duplicates if multiple processes have the same struct file open,
but tells you which process it is.


# 1.33 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.32 01-Oct-2004 yamt

branches: 1.32.4; 1.32.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.


# 1.31 27-Jul-2004 atatat

branches: 1.31.2;
The message buffer datum instrumented by KERN_MSGBUFSIZE is actually a
long, not an int, and this causes "problems" on LP64be machines
(sparc64, etc). Assign the value to a temporary int and instrument
that instead. Should be fine until someone wants a message buffer
larger than two gigabytes.


# 1.30 26-May-2004 christos

(off_t)(long) is wrong when it comes to kernel addresses [because on a 32 bit
machine if the high bit is set they turn negative]. Make an intermediate cast
to unsigned long.


# 1.29 03-May-2004 martin

Fix a comment.
Approved by Andrew Brown.


# 1.28 23-Apr-2004 simonb

s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).


# 1.27 16-Apr-2004 atatat

Prefer that kern.hostid is printed in hex, not as a signed decimal,
and avoid accidental sign-extension when setting it.


# 1.26 08-Apr-2004 atatat

Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.


# 1.25 08-Apr-2004 atatat

Clear out the struct kinfo_drivers before stuffing things into it.
Avoids leaking garbage from the stack (left over from the earlier
call to sysctl_locate()).


Revision tags: netbsd-2-0-base
# 1.24 24-Mar-2004 atatat

branches: 1.24.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.23 17-Mar-2004 yamt

- move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.


# 1.22 21-Feb-2004 atatat

Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".


# 1.21 19-Feb-2004 atatat

Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro.


# 1.20 17-Jan-2004 atatat

Avoid dereferencing l...it might be NULL


# 1.19 28-Dec-2003 atatat

Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.


# 1.18 28-Dec-2003 atatat

Adjust error returns in kern.cp_time when a specific processor is
being requested so that (1) the uniprocessor case and the
multiprocessor case are more similar and (2) so that we return ENOENT
when a non-existent processor is requested (which is both more
sensible and follows the general order of things anyway).


# 1.17 28-Dec-2003 atatat

Rename sysctl_kern_hostname() to sysctl_setlen() and use it also for
domainname. Note that there's no need to copy rnode since we're not
changing any of it, nor protecting anything from change.

Thanks to martin for initial work.


# 1.16 28-Dec-2003 atatat

RCSid police


# 1.15 28-Dec-2003 martin

After changing hostname, adjust hostnamelen.
This closes PR kern/23907.


# 1.14 26-Dec-2003 martin

Make kern.rtc_offset writable at securelevel <= 0.
This allows boot-time adjustment when a machine runs other OSes with
RTC == localtime.


# 1.13 20-Dec-2003 yamt

update a comment to match with the previous change (rev.1.12).


# 1.12 20-Dec-2003 yamt

restore functionality to decrease kern.maxvnodes which
has been backed out during sysctl rework.


# 1.11 12-Dec-2003 simonb

In sysctl_kern_lwp adjust offsets into the mib entries so that
they are now correct. Fixes problems with "ps -s" not working.
Also use KERN_LWPSLOP instead of KERN_PROCSLOP.

Both changes from Andrew Brown.


# 1.10 10-Dec-2003 atatat

Make kern.dump_on_panic writeable again, too


# 1.9 09-Dec-2003 atatat

Make kern.sbmax writeable again as well.

From a follow-on to PR kern/23695 by a Mr. Davis, which I missed at a
quick glance.


# 1.8 09-Dec-2003 atatat

Make kern.logsigexit writeable again.

Fixes PR kern/23695.


# 1.7 07-Dec-2003 martin

Add missing break.


# 1.6 07-Dec-2003 he

Also make declaration of sysctl_kern_maxptys() depend on NPTY > 0.
Makes the mvme68k RAMDISK kernel compile again.


# 1.5 06-Dec-2003 martin

Fix kern.cp_time for MULTIPROCESSOR kernels: calculate size of result
correctly, free original instead of incremented pointer, copy results for
n = -2 case too, so top shows correct stats.
Additionaly, rearange code for better readability (from Andrew).


# 1.4 06-Dec-2003 fvdl

Include opt_posix.h for the P1003_1B_SEMAPHORE define.
Include <machine/cpu.h> just to be sure.


# 1.3 06-Dec-2003 martin

We can not count CPUs at sysctl initialization time - so don't make
hw.ncpu an immediate value.


# 1.2 06-Dec-2003 atatat

#include "opt_multiprocessor.h"

This makes hw.ncpu and kern.cp_time work better on those platforms.


# 1.1 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.