History log of /freebsd-9.3-release/sys/kern/subr_param.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 257433 31-Oct-2013 kib

MFC r257221:
Fix typo.


# 254515 19-Aug-2013 andre

MFC a bundle of commits that bring autotuning to mbufs, maxfiles/sockets
and maxusers to the 9-stable branch. It is committed as bundle because
these patches build on each other and only provide the functionality in
their entirety. Some are bug fixes to aspects of earlier commits.

MFC r242029 (alfred):

Allow autotune maxusers > 384 on 64 bit machines.

MFC r242847 (alfred):

Allow maxusers to scale on machines with large address space.

MFC r243631 (andre):

Base the mbuf related limits on the available physical memory or
kernel memory, whichever is lower. The overall mbuf related memory
limit must be set so that mbufs (and clusters of various sizes)
can't exhaust physical RAM or KVM.

At the same time divorce maxfiles from maxusers and set maxfiles to
physpages / 8 with a floor based on maxusers. This way busy servers
can make use of the significantly increased mbuf limits with a much
larger number of open sockets.

MFC r243639 (andre):

Complete r243631 by applying the remainder of kern_mbuf.c that got
lost while merging into the commit tree.

MFC r243668 (andre):

Using a long is the wrong type to represent the realmem and maxmbufmem
variable as they may overflow on i386/PAE and i386 with > 2GB RAM.

MFC r243995, r243996, r243997 (pjd):

Style cleanups, Make use of the fact that uma_zone_set_max(9) already
returns actual limit set.

MFC r244080 (andre):

Prevent long type overflow of realmem calculation on ILP32 by forcing
calculation to be in quad_t space. Fix style issue with second parameter
to qmin().

MFC r245469 (alfred):

Do not autotune ncallout to be greater than 18508.

MFC r245575 (andre):

Move the mbuf memory limit calculations from init_param2() to
tunable_mbinit() where it is next to where it is used later.

MFC r246207 (andre):

Remove unused VM_MAX_AUTOTUNE_NMBCLUSTERS define.

MFC r249843 (andre):

Base the calculation of maxmbufmem in part on kmem_map size
instead of kernel_map size to prevent kernel memory exhaustion
by mbufs and a subsequent panic on physical page allocation
failure.

MFC r253204 (andre):

Fix style issues, a typo in "kern.ipc.nmbufs" and correctly plave and
expose the value of the tunable maxmbufmem as "kern.ipc.maxmbufmem"
through sysctl.

MFC r253207 (andre):

Make use of the fact that uma_zone_set_max(9) already returns the
rounded limit making a call to uma_zone_get_max(9) unnecessary.

Tested by: alfred (iXsystems)


# 253250 12-Jul-2013 grehan

MFC r245066
-------------------------------------------------------------------------
Teach the kernel to recognize that it is executing inside a bhyve virtual
machine.
-------------------------------------------------------------------------

This will help a 9.2 guest to run more effectively as a bhyve guest.

Reviewed by: neel
Approved by: re


# 251897 18-Jun-2013 scottl

Merge the second part of the unmapped I/O changes. This enables the
infrastructure in the block layer and UFS filesystem as well as a few
drivers. The list of MFC revisions is long, so I won't quote changelogs.

r248508,248510,248511,248512,248514,248515,248516,248517,248518,
248519,248520,248521,248550,248568,248789,248790,249032,250936

Submitted by: kib
Approved by: kib
Obtained from: Netflix


# 246400 06-Feb-2013 zont

MFC r245457:
- Detect when we are in KVM.


# 240501 14-Sep-2012 zont

MFC r240026:
- Make kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, kern.dflssiz, kern.maxssiz
and kern.sgrowsiz sysctls writable.

MFC r240068:
- Mark some sysctls with CTLFLAG_TUN flag instead of CTLFLAG_RDTUN.

MFC r240069:
- After r240026 sgrowsiz should be used in a safer maner.


# 239582 22-Aug-2012 kib

MFC r239301:
Add a sysctl kern.pid_max, which limits the maximum pid the system is
allowed to allocate, and corresponding tunable with the same
name. Note that existing processes with higher pids are left intact.

MFC r239328:
Fix grammar.

MFC r239329:
As a safety measure, disable lowering pid_max too much.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 219920 23-Mar-2011 alc

Modestly increase the maximum allowed size of the kmem map on i386.
Also, express this new maximum as a fraction of the kernel's address
space size rather than a constant so that increasing KVA_PAGES will
automatically increase this maximum. As a side-effect of this change,
kern.maxvnodes will automatically increase by a proportional amount.

While I'm here ensure that this change doesn't result in an unintended
increase in maxpipekva on i386. Calculate maxpipekva based upon the
size of the kernel address space and the amount of physical memory
instead of the size of the kmem map. The memory backing pipes is not
allocated from the kmem map. It is allocated from its own submap of
the kernel map. In short, it has no real connection to the kmem map.
(In fact, the commit messages for the maxpipekva auto-sizing talk
about using the kernel map size, cf. r117325 and r117391, even though
the implementation actually used the kmem map size.) Although the
calculation is now done differently, the resulting value for
maxpipekva should remain almost the same on i386. However, on amd64,
the value will be reduced by 2/3. This is intentional. The recent
change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had
the unnecessary side-effect of increasing maxpipekva. This change is
effectively restoring maxpipekva on amd64 to its prior value.

Eliminate init_param3() since it is no longer used.


# 217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


# 210935 06-Aug-2010 csjp

Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes
the virtualization detection successfully disabling the clflush instruction.
This fixes insta-panics for XEN hvm users when the hw.clflush_disable
tunable is -1 or 0 (-1 by default).

Discussed with: jhb


# 209492 23-Jun-2010 nwhitehorn

Reverse the logic of the if statement that sets the default value of
HZ; the list of 1000 Hz platforms was getting unwieldy.

Suggested by: marcel


# 209490 23-Jun-2010 nwhitehorn

Move default HZ from 100 to 1000 on powerpc.

Reviewed by: marcel
MFC after: 2 weeks


# 204611 02-Mar-2010 ivoras

Document the VM detection type and sysctl a bit better.


# 204420 27-Feb-2010 alc

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


# 204278 24-Feb-2010 brooks

Don't inforce an upper bound on kern.ngroups. The INT_MAX-1 limit was
too high due to several overflows. The actual limit is somewhere in the
neighborhood of INT_MAX/4 on 64-bit machines, but most systems could not
support such a limit due to a lack of memory and the cost of duplicate
credentials.

Reported by: bde


# 202143 12-Jan-2010 brooks

Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1. Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after: 1 month


# 195430 07-Jul-2009 silby

Increase HZ_VM from 10 to 100. While 10 hz saves cpu time
under VM environments, it's too slow for FreeBSD to work
properly. For example, ping at 10hz pings about every 600ms
instead of about every second.

Approved by: re (kib)


# 190331 23-Mar-2009 jhb

Improve the description of a few sysctls.

Submitted by: bde (partially)
MFC after: 3 days


# 189745 12-Mar-2009 jhb

Change the sysctls for maxbcache and maxswzone from int to long. I missed
this earlier since these sysctls don't exist in 7.x yet.


# 189744 12-Mar-2009 jhb

Export the current values of nbuf, ncallout, and nswbuf via read-only
sysctls that match the tunable names.

MFC after: 3 days


# 189649 10-Mar-2009 jhb

- Make maxpipekva a signed long rather than an unsigned long as overflow
is more likely to be noticed with signed types.
- Make amountpipekva a long as well to match maxpipekva.

Discussed with: bde


# 189595 09-Mar-2009 jhb

Adjust some variables (mostly related to the buffer cache) that hold
address space sizes to be longs instead of ints. Specifically, the follow
values are now longs: runningbufspace, bufspace, maxbufspace,
bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace,
hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a
relatively small number (~ 44000) of buffers set in kern.nbuf would result
in integer overflows resulting either in hangs or bogus values of
hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see
such problems. There was a check for a nbuf setting that would cause
overflows in the auto-tuning of nbuf. I've changed it to always check and
cap nbuf but warn if a user-supplied tunable would cause overflow.

Note that this changes the ABI of several sysctls that are used by things
like top(1), etc., so any MFC would probably require a some gross shims
to allow for that.

MFC after: 1 month


# 186619 30-Dec-2008 ivoras

Document the relationship between enum VM_GUEST and the vm_guest_sysctl_names
array.

Approved by: gnn (original version)


# 186522 27-Dec-2008 bz

Hide detect_virtual() along with the accompanying string
arrays under #ifndef XEN to make XEN config compile again.
In case of Xen vm_guest is hard coded.

Move the list for the vm_guest sysctl out of the restictive
bounds as the sysctl is there in either case.


# 186286 18-Dec-2008 ivoras

By popular request, stringify kern.vm_guest sysctl. Now it returns a
short, self-documenting string describing the detected virtual
environment.

Approved by: gnn (mentor) (earlier version)


# 186252 17-Dec-2008 ivoras

Introduce a sysctl kern.vm_guest that reflects what the kernel knows about
it running under a virtual environment. This also introduces a globally
accessible variable vm_guest that can be used where appropriate in the
kernel to inspect this environment.

To make it easier for the long run, an enum VM_GUEST is also introduced,
which could possibly be factored out in a header somewhere (but the
question is where - vm/vm_param.h? sys/param.h?) so it eventually becomes
a part of the standard KPI. In any case, it's a start.

The purpose of all this isn't to absolutely detect that the OS is running
under a virtual environment (cf. "redpill") but to allow the parts of the
kernel and the userland that care about this particular aspect and can do
something useful depending on it to have a standardised interface. Reducing
kern.hz is one example but there are other things that could be done like
avoiding context switches, not using CPU instructions that are known to be
slow in emulation, possibly different strategies in VM (memory) allocation,
CPU scheduling, etc.

It isn't clear if the JAILS/VIMAGE functionality should also be exposed
by this particular mechanism (probably not since they're not "full"
virtual hardware environments). Sometime in the future another sysctl and
a variable could be introduced to reflect if the kernel supports any kind
of virtual hosting (e.g. VMWare VMI, Xen dom0).

Reviewed by: silence from src-commiters@, virtualization@, kmacy@
Approved by: gnn (mentor)
Security: Obscurity doesn't help.


# 185772 08-Dec-2008 jkim

- Detect Bochs BIOS variants and use HZ_VM as well.
- Free kernel environment variable after its use.
- Fix style(9) nits.


# 184326 27-Oct-2008 sobomax

vm_pnames should be "const char *const[]".

Submitted by: Christoph Mallon


# 184324 27-Oct-2008 sobomax

vm_pnames has no reason to be global.

MFC after: 2 weeks


# 184323 27-Oct-2008 sobomax

Default HZ value (1,000) on i386/amd64 is not very virtual machine friendly.
Due to the nature of the beast it causes lot of unproductive overhead. This
is especially bad when running SMP kernel on VMWare with several virtual
processors - idle FreeBSD guest with SMP kernel takes 150% host CPU time on my
dual-core MacBook Pro when I am enabling two virtual CPUs, making even host
not very usable. Detect when we are running in the sandbox and reduce HZ
to 10 (can be adjusted via VM_HZ in the kernel config) in such cases. This
brings host CPU usage of idle FreeBSD/SMP on two virtual processors down
to 10%.

Detect most popular VM platforms out there - VMWare, Parallels, VirtualBox
and VirtualPC.

MFC after: 2 weeks


# 180262 04-Jul-2008 alc

Correct an error in the comments for init_param3().

Discussed with: silby


# 178872 09-May-2008 pjd

- Export HZ value via kern.hz sysctl (this is the same name as for the
loader tunable).
- Document other sysctls in this file and also mark them as loader tunable
via CTLFLAG_RDTUN flag.

Reviewed by: roberto


# 172696 16-Oct-2007 alfred

Export maxswzone, maxbcache, maxtsiz, dfldsiz, maxdsiz, dflssiz, maxssiz,
and sgrowsiz via sysctl.

MFC after: 1 week


# 151369 16-Oct-2005 kris

Forced commit to note that the previous commit referred to r1.67, not r1.66

Pointed out by: bde


# 151342 14-Oct-2005 kris

Partially revert revision 1.66, which contained a change that did not
correspond to the commit log. It changed the maxswzone and maxbcache
parameters from int to long, without changing the extern definitions
in <sys/buf.h>.

In fact it's a good thing it did not, because other parts of the system
are not yet ready for this, and on large-memory sparc machines it causes
severe filesystem damage if you try.

The worst effect of the change was that the tunables controlling the
above variables stopped working. These were necessary to allow such
large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not
set a hard-coded upper limit on these parameters and they ended
up overflowing an int, causing an infinite loop at boot in bufinit().

Reviewed by: mlaier


# 145154 16-Apr-2005 marius

Increase default HZ for sparc64 to 1000.


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 138214 30-Nov-2004 bms

Fix the build.


# 138211 29-Nov-2004 peter

Switch from 1024hz to 1000hz on amd64 to match i386. 1024 is a bad
choice because it is so in sync with stathz (128hz or 4096hz etc).


# 137393 08-Nov-2004 des

#include <vm/vm_param.h> instead of <machine/vmparam.h> (the former
includes the latter, but also declares variables which are defined
in kern/subr_param.c).

Change som VM parameters from quad_t to unsigned long. They refer to
quantities (size limits for text, heap and stack segments) which must
necessarily be smaller than the size of the address space, so long is
adequate on all platforms.

MFC after: 1 week


# 137374 08-Nov-2004 marcel

Increase default HZ for ia64 to 1000.


# 137307 06-Nov-2004 phk

Increase default HZ for i386 to 1000


# 127911 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 127612 30-Mar-2004 alc

White space and wording changes to init_param3().

Mostly submitted by: bde


# 127501 27-Mar-2004 alc

Revise the direct or optimized case to use uiomove_fromphys() by the reader
instead of ephemeral mappings using pmap_qenter() by the writer. The
writer is still, however, responsible for wiring the pages, just not
mapping them. Consequently, the allocation of KVA for the direct case is
unnecessary. Remove it and the sysctls limiting it, i.e.,
kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number
of temporarily wired pages is still, however, limited by
kern.ipc.maxpipekva.

Note: On platforms lacking a direct virtual-to-physical mapping,
uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus,
the number of available sf_bufs can influence the performance of pipes
on platforms such i386. Surprisingly, I saw the greatest gain from this
change on such a machine: lmbench's pipe bandwidth result increased from
~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.


# 126959 14-Mar-2004 peter

Set default HZ to 1024 for amd64. The comment in kern/tty.c doesn't
apply here because we have 64 bit longs and don't suffer the hz > 169
overflows.


# 118764 11-Aug-2003 silby

More pipe changes:

From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues. (Bad explanation provided by me.)

From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.

Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)


# 117391 10-Jul-2003 silby

Add init_param3() to subr_param. This function is called
immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.

Suggested by: bde


# 117338 08-Jul-2003 silby

Pull in the entire kmem_map size calculation from kern_malloc, rather
than the shortcircuited version I had been using, which only worked
properly on i386 & amd64.

Also, change an autoscale constant to account for the more correct
kmem_map size.

Problem noticed by: mux


# 117325 08-Jul-2003 silby

Put some concrete limits on pipe memory consumption:

- Limit the total number of pipes so that we do not
exhaust all vm objects in the kernel map. When
this limit is reached, a ratelimited message will
be printed to the console.

- Put a soft limit on the amount of memory consumable
by pipes. Once the limit has been reached, all new
pipes will be limited to 4K in size, rather than the
default of 16K.

- Put a limit on the number of pages that may be used
for high speed page flipping in order to reduce the
amount of wired memory. Pipe writes that occur
while this limit is exceeded will fall back to
non-page flipping mode.

The above values are auto-tuned in subr_param.c and
are scaled to take into account both the size of
physical memory and the size of the kernel map.

These limits help to reduce the "kernel resources exhausted"
panics that could be caused by opening a large
number of pipes. (Pipes alone are no longer able
to exhaust all resources, but other kernel memory hogs
in league with pipes may still be able to do so.)

PR: 53627
Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com
MFC after: 1 week


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


# 94754 15-Apr-2002 phk

Improve the implementation of adjtime(2).

Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.

In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder. If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.

The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.

Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.

This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.


# 91780 07-Mar-2002 silby

Unconditionally limit maxproc so that it is not possible
to exhaust all kmaps. The only reward for setting maxproc
to a value which will cause kmap exhaustion is a panic
during a forkbomb attack.

MFC after: 3 days


# 90274 05-Feb-2002 dillon

Allow the kern.maxusers boot tuneable to be set to 0 (previously only
the kernel config's maxusers could be set to 0 for autosizing to work).
Reviewed by: rwatson, imp
MFC after: 3 days


# 89769 24-Jan-2002 dillon

Make the 'maxusers 0' auto-sizing code slightly more conservative. Change
from 1 megabyte of ram per user to 2 megabytes of ram per user, and
reduce the cap from 512 to 384. 512 leaves around 240 MB of KVM available
while 384 leaves 270 MB of KVM available. Available KVM is important
in order to deal with zalloc and kernel malloc area growth.

Reviewed by: mckusick
MFC: either before 4.5 if re's agree, or after 4.5


# 87860 14-Dec-2001 peter

Proper fix for old config setting maxusers to 8.


# 87839 14-Dec-2001 dillon

Too many people are compiling kernels with maxusers set to 0 without the new
config. Hack the kernel to force auto-sizing if the old config is used.


# 87817 13-Dec-2001 silby

Limit maxprocperuid to 9/10 maxproc, and limit maxfilesperproc to 9/10
maxfiles. This should make local resource exhaustion attacks easier
to handle with a non-tweaked setup.

MFC after: 3 days


# 87546 08-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


# 86333 13-Nov-2001 dillon

Create a mutex pool API for short term leaf mutexes.
Replace the manual mutex pool in kern_lock.c (lockmgr locks) with the new API.
Replace the mutexes embedded in sxlocks with the new API.


# 84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# 81986 20-Aug-2001 dillon

Conditionalize VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX so MD sections
that don't define these constants don't break.


# 81933 19-Aug-2001 dillon

Limit the amount of KVM reserved for the buffer cache and for swap-meta
information. The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache. This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating. The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.


# 80418 26-Jul-2001 peter

Move param.c out of the conf directory and make it fully dynamic.
Tunables are now derived at boot time from maxusers. ie: change maxusers
via a tunable and all the derivative settings change. You can change
the other tunables individually as well. Even hz etc is tunable.


# 78592 22-Jun-2001 bmilekic

Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

o Reduce contention for SMP by offering per-CPU pools and locks.
o Better use of data cache due to per-CPU pools.
o Much less code cache pollution due to excessively large allocation macros.
o Framework for `grouping' objects from same page together so as to be able
to possibly free wired-down pages back to the system if they are no longer
needed by the network stacks.

Additional things changed with this addition:

- Moved some mbuf specific declarations and initializations from
sys/conf/param.c into mbuf-specific code where they belong.
- m_getclr() has been renamed to m_get_clrd() because the old name is really
confusing. m_getclr() HAS been preserved though and is defined to the new
name. No tree sweep has been done "to change the interface," as the old
name will continue to be supported and is not depracated. The change was
merely done because m_getclr() sounds too much like "m_get a cluster."
- TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
systat(1) (see TODO below).
- Fixed systat(1) to display number of "free mbufs" based on new per-CPU
stat structures.
- Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
per-CPU stat structures. All infos are fetched via sysctl.

TODO (in order of priority):

- Re-enable mbtypes statistics in both netstat(1) and systat(1) after
introducing an SMP friendly way to collect the mbtypes stats under the
already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
seems too costly for a mere stat update, especially when other locks are
already present).
- Optionally have systat(1) display not only "total free mbufs" but also
"total free mbufs per CPU pool."
- Fix minor length-fetching issues in netstat(1) related to recently
re-enabled option to read mbuf stats from a core file.
- Move reference counters at least for mbuf clusters into an unused portion
of the cluster itself, to save space and need to allocate a counter.
- Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/


# 67899 29-Oct-2000 phk

Remove unneeded <stddef.h> #includes.


# 67046 12-Oct-2000 jasone

For lockmgr mutex protection, use an array of mutexes that are allocated
and initialized during boot. This avoids bloating sizeof(struct lock).
As a side effect, it is no longer necessary to enforce the assumtion that
lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been
removed.

Idea taken from: BSD/OS.


# 59839 01-May-2000 peter

Move the MSG* and SEM* options to opt_sysvipc.h
Remove evil allocation macros from machdep.c (why was that there???) and
use malloc() instead.
Move paramters out of param.h and into the code itself.
Move a bunch of internal definitions from public sys/*.h headers (without
#ifdef _KERNEL even) into the code itself.

I had hoped to make some of this more dynamic, but the cost of doing
wakeups on all sleeping processes on old arrays was too frightening.
The other possibility is to initialize on the first use, and allow
dynamic sysctl changes to parameters right until that point. That would
allow /etc/rc.sysctl to change SEM* and MSG* defaults as we presently
do with SHM*, but without the nightmare of changing a running system.


# 58820 30-Mar-2000 peter

Make sysv-style shared memory tuneable params fully runtime adjustable
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.


# 54478 12-Dec-1999 green

This is Bosko Milekic's mbuf allocation waiting code. Basically, this
means that running out of mbuf space isn't a panic anymore, and code
which runs out of network memory will sleep to wait for it.

Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: green, wollman


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 48579 05-Jul-1999 msmith

Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).


# 45515 09-Apr-1999 des

Allow setting MAXFILES in the kernel config.


# 41774 14-Dec-1998 dillon

Fixed problems with kernel config file overrides of sysv semaphore
parameters. Prior to this fix a kernel config override would effect
only some of the kernel files, resulting in panics.

PR: kern/9068


# 40931 05-Nov-1998 dg

Implemented zero-copy TCP/IP extensions via sendfile(2) - send a
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.


# 37565 11-Jul-1998 bde

Moved definition of fscale from param.c to kern_synch.c where it
should always have been (it has no user-servicable parts even at
compile time) and staticized it.


# 37315 30-Jun-1998 phk

Add 3 sysctl variables for future use by ps)1_


# 37087 21-Jun-1998 bde

Round tickadj up. This prevents tickadj from being 0 when HZ > 500,
which makes adjtime(2) useless and confuses xntpd(8) into refusing
to start even when it would use the kernel PLL instead of adjtime().
The result is the same as recommended by tickadj(8), at least when
HZ divides 10^6. Of course, you wouldn't want to actually use
adjtime() when HZ is large. In the silly boundary case of HZ == 10^6,
tickadj == tick == 1 so the clock stops while adjtime() is active.


# 36079 15-May-1998 wollman

Convert socket structures to be type-stable and add a version number.

Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.

Convert PF_LOCAL PCB structures to be type-stable and add a version number.

Define an external format for infomation about socket structures and use
it in several places.

Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time. This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.

It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).


# 33882 27-Feb-1998 guido

Raise ncallout from NPROC + 16 to NPROC + 16 + MAXFILES. This shold
prevent a possible DOS attack. The proper fix (to dynamically grow
the callout list) is in the make.
Submitted by: Paul Traina


# 26638 14-Jun-1997 bde

Removed unused #includes.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21770 16-Jan-1997 bde

Removed option EXTRAVNODES. All versions of FreeBSD-2.x have a sysctl
variable `kern.maxvnodes' which gives much better control over vnode
allocation than EXTRAVNODES (except in -current between 1995/10/28 and
1996/11/12, kern.maxvnodes was read-only and thus useless).


# 21737 15-Jan-1997 dg

Fix bug related to map entry allocations where a sleep might be attempted
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.

Reviewed by: wollman


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 16025 30-May-1996 peter

Add an option "EXTRA_VNODES" to cause an extra number of vnode structures
to be allocated at boot time. This is an expensive option, as they
consume physical ram and are not pageable etc. In certain situations,
this kind of option is quite useful, especially for news servers that
access a large number of directories at random and torture the name cache.
Defining 5000 or 10000 extra vnodes should cut down the amount of vnode
recycling somewhat, which should allow better name and directory caching
etc.

This is a "your mileage may vary" option, with no real indication of
what works best for your machine except trial and error. Too many will
cost you ram that you could otherwise use for disk buffers etc.

This is based on something John Dyson mentioned to me a while ago.


# 15722 10-May-1996 wollman

Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.


# 15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 15538 02-May-1996 phk

First pass at cleaning up macros relating to pages, clusters and all that.


# 14526 11-Mar-1996 hsu

Merge in Lite2: proc LIST changes.
Reviewed by: david & bde


# 14328 02-Mar-1996 peter

Add more options into the conf/options and i386/conf/options.i386 files
and the #include hooks so that 'make depend' is more useful. This
covers most of the options I regularly use (but not all) and some other
easy ones.


# 13226 04-Jan-1996 wollman

Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.


# 12725 10-Dec-1995 phk

Last commit this round: Staticize.
we are now down to about 1146 symbols being global, of which
I estimate that about 100 are validly so.


# 9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


# 9371 29-Jun-1995 dg

Removed "GATEWAY" consideration when calculating number of mbuf clusters.
It now always uses the value that was used for the GATEWAY case.


# 9369 29-Jun-1995 dg

Killed "TIMEZONE" and "DST" options. They have been forced to 0 by config
for more than a year now. Moved the declaration of 'tz' into kern_time.c.


# 8747 25-May-1995 dg

Made "NMBCLUSTERS" calculation dynamic and fixed bogus use of "NMBCLUSTERS"
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network "hang" when the mbuf cluster pool runs out.

Reviewed by: John Dyson


# 6577 20-Feb-1995 guido

Implement maxprocperuid and maxfilesperproc. They are tunable
via sysctl(8). The initial value of maxprocperuid is maxproc-1,
that of maxfilesperproc is maxfiles (untill maxfile will disappear)

Now it is at least possible to prohibit one user opening maxfiles

-Guido

Submitted by:
Obtained from:


# 6492 16-Feb-1995 joerg

Alow overriding of the various SHM* options.

Submitted by: Heikki Suonsivu <hsu@fx7.cs.hut.fi>


# 5530 12-Jan-1995 dg

Increase maxfiles to NPROC*2. This makes the per-process open file limit
highly bogus, however, and this needs to be fixed.


# 5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 2729 13-Sep-1994 dfr

Added SYSV ipcs.

Obtained from: NetBSD and FreeBSD-1.1.5


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources