History log of /freebsd-9.3-release/sys/vm/swap_pager.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 253257 12-Jul-2013 kib

MFC r253095:
Fix typo in comment.

Approved by: re (hrs)


# 251897 18-Jun-2013 scottl

Merge the second part of the unmapped I/O changes. This enables the
infrastructure in the block layer and UFS filesystem as well as a few
drivers. The list of MFC revisions is long, so I won't quote changelogs.

r248508,248510,248511,248512,248514,248515,248516,248517,248518,
248519,248520,248521,248550,248568,248789,248790,249032,250936

Submitted by: kib
Approved by: kib
Obtained from: Netflix


# 244547 21-Dec-2012 jh

MFC r243333:

- Don't pass geom and provider names as format strings.
- Add __printflike() attributes.
- Remove an extra argument for the g_new_geomf() call in swapongeom_ev().


# 240760 20-Sep-2012 alc

MFC r237168
The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer. This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.


# 239789 28-Aug-2012 pluknet

MFC r239723:
Typo in previous change: print half the theoretical maximum as maximum
recommended amount.


# 239645 24-Aug-2012 des

MFH (r239327): warn when too much swap is configured, and avoid flooding
the console when running out of space for metadata.


# 232405 02-Mar-2012 ed

MFC r231378:

Remove direct access to si_name.

Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.


# 231188 08-Feb-2012 mav

MFC 230877:
Fix NULL dereference panic on attempt to turn off (on system shutdown)
disconnected swap device.

This is quick and imperfect solution, as swap device will still be opened
and GEOM will not be able to destroy it. Proper solution would be to
automatically turn off and close disconnected swap device, but with existing
code it will cause panic if there is at least one page on device, even if
it is unimportant page of the user-level process. It needs some work.


# 229012 30-Dec-2011 kib

MFC r228432:
Fix printf.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# 225089 22-Aug-2011 kib

Update some comments in swap_pager.c.

Reviewed and most wording by: alc
MFC after: 1 week
Approved by: re (bz)


# 225076 22-Aug-2011 kib

Apply the limit to avoid the overflows in the radix tree subr_blist.c
after the conversion of the swap device size to the page size units,
not before. That lifts the limit on the usable swap partition size
from 32GB to 256GB, that is less depressing for the modern systems.

Submitted by: Alexander V. Chernikov <melifaro ipfw ru>
Reviewed by: alc
Approved by: re (bz)
MFC after: 2 weeks


# 224582 01-Aug-2011 kib

Implement the linprocfs swaps file, providing information about the
configured swap devices in the Linux-compatible format.

Based on the submission by: Robert Millan <rmh debian org>
PR: kern/159281
Reviewed by: bde
Approved by: re (kensmith)
MFC after: 2 weeks


# 223825 06-Jul-2011 trasz

All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0. This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".


# 221096 26-Apr-2011 obrien

Reap old SPL comments.

Reviewed by: alc


# 220373 05-Apr-2011 trasz

Add accounting for most of the memory-related resources.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 219124 01-Mar-2011 brucec

Change the return type of vmspace_swap_count to a long to match the other
vmspace_*_count functions.

MFC after: 3 days


# 218966 23-Feb-2011 brucec

Calculate and return the count in vmspace_swap_count as a vm_offset_t
instead of an int to avoid overflow.

While here, clean up some style(9) issues.

PR: kern/152200
Reviewed by: kib
MFC after: 2 weeks


# 217529 18-Jan-2011 alc

Move the definition of M_VMPGDATA to the swap pager, where the only
remaining uses are.


# 216873 01-Jan-2011 brucec

There can be more than 0x20000000 swap meta blocks allocated if a swap-backed
md(4) device is used. Don't panic when deallocating such a device if swap
has been used.

PR: kern/133170
Discussed with: kib
MFC after: 3 days


# 216128 02-Dec-2010 trasz

Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.

Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation


# 214095 20-Oct-2010 avg

PG_BUSY -> VPO_BUSY, PG_WANTED -> VPO_WANTED in manual pages and comments

Reviewed by: alc
MFC after: 4 days


# 207822 09-May-2010 alc

Call vm_page_deactivate() rather than vm_page_dontneed() in
swp_pager_force_pagein(). By dirtying the page, swp_pager_force_pagein()
forces vm_page_dontneed() to insert the page at the head of the inactive
queue, just like vm_page_deactivate() does. Moreover, because the page
was invalid, it can't have been mapped, and thus the other effect of
vm_page_dontneed(), clearing the page's reference bits has no effect. In
summary, there is no reason to call vm_page_dontneed() since its effect
will be identical to calling the simpler vm_page_deactivate().


# 207806 08-May-2010 alc

Remove the page queues lock around a call to vm_page_activate(). Make the
page dirty before adding it to the active queue.


# 207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


# 207747 07-May-2010 alc

Eliminate unnecessary page queues locking.


# 207410 29-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 207364 29-Apr-2010 kib

In swap pager, do not free the non-requested pages from the run if they are
wired. Kstack pages are wired, this change prepares swap pager for handling
of long runs of kstack pages.

Noted and reviewed by: alc
Tested by: pho
MFC after: 2 weeks


# 206761 17-Apr-2010 alc

Setting PG_REFERENCED on the requested page in swap_pager_getpages() is
either redundant or harmful, depending on the caller. For example, when
called by vm_fault(), it is redundant. However, when called by
vm_thread_swapin(), it is harmful. Specifically, if the thread is later
swapped out, having PG_REFERENCED set on its stack pages leads the page
daemon to reactivate these stack pages and delay their reclamation.

Reviewed by: kib
MFC after: 3 weeks


# 198811 02-Nov-2009 ivoras

Add sysctl documentation strings. The descriptions are derived
from tuning(7). One of the descriptions references tuning(7) because
it is too complex to adequatly describe here (it is not a simple
boolean sysctl) and users should be warned to that.

Reviewed by: alc, kib
Approved by: gnn (mentor)


# 198201 18-Oct-2009 kib

Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).
Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is
actually exceed.

Both changes aim at calling priv_check(9) only for the cases when
privilege is actually exercised by the process.

Reported and tested by: rwatson
Reviewed by: alc
MFC after: 3 days


# 194814 24-Jun-2009 kib

Initialize the uip to silence gcc warning that seems to sneak in in some
build environments.

Reported by: alc, bf1783 at googlemail com


# 194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 191625 28-Apr-2009 kib

Fix typo.


# 191543 26-Apr-2009 alc

Eliminate an errant comment.

Discussed with: tegge


# 191478 25-Apr-2009 alc

Eliminate unnecessary calls to pmap_clear_modify(). Specifically, calling
pmap_clear_modify() on a page is pointless if that page is not mapped or
it is only mapped for read access. Instead, assert that the page is not
mapped or not mapped for write access as appropriate.

Eliminate unnecessary clearing of a page's dirty mask. Instead, assert
that the page's dirty mask is clear.


# 188859 20-Feb-2009 alc

Eliminate stale comments.


# 183474 29-Sep-2008 kib

Move the code for doing out-of-memory grass from vm_pageout_scan()
into the separate function vm_pageout_oom(). Supply a parameter for
vm_pageout_oom() describing a reason for the call.

Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone
is exhausted.

Reviewed by: alc
Tested by: pho, jhb
MFC after: 2 weeks


# 182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 181019 30-Jul-2008 jhb

If the kernel has run out of metadata for swap, then explicitly panic()
instead of emitting a warning before deadlocking.

MFC after: 1 month


# 180446 11-Jul-2008 kib

Use the VM_ALLOC_INTERRUPT for the page requests when allocating memory
for the bio for swapout write. It allows the page allocator to drain
free page list deeper. As result, a deadlock where pageout deamon sleeps
waiting for bio to be allocated for swapout is no more reproducable in
practice.

Alan said that M_USE_RESERVE shall be ressurrected and used there, but
until this is implemented, M_NOWAIT does exactly what is needed.

Tested by: pho, kris
Reviewed by: alc
No objections from: phk
MFC after: 2 weeks (RELENG_7 only)


# 178792 05-May-2008 kmacy

add malloc flag to blist so that it can be used in ithread context

Reviewed by: alc, bsdimp


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 175157 08-Jan-2008 csjp

When MAC is enabled in the kernel, fix a panic triggered by a locking
assertion hit in swapoff_one() when we un-mount a swap partition. We
should be using curthread where we used thread0 before. This change
also replaces the thread argument with a credential argument, as the
MAC framework only requires the cred.

It should be noted that this allows the machine to be rebooted without
panicing with "cannot differ from curthread or NULL" when MAC is enabled.

Submitted by: rwatson
Reviewed by: attilio
MFC after: 2 weeks


# 173292 02-Nov-2007 maxim

o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages().

Submitted by: Mark Tinguely


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 171737 05-Aug-2007 alc

Consider a scenario in which one processor, call it Pt, is performing
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device. The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does. In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object. Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock. Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock. Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock. So, now the
doomed object has a reference count of one! Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc(). Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object. However, the doomed object is still in use by Pa.

Repeating my key point, vm_pager_object_lookup() must not return a
doomed object. Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.

Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks


# 171019 24-Jun-2007 alc

Eliminate GIANT_REQUIRED from swap_pager_putpages().

Approved by: re (mux)
MFC after: 1 week


# 170292 04-Jun-2007 attilio

Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)


# 170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 170152 31-May-2007 kib

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# 169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 168979 23-Apr-2007 rwatson

Audit pathnames looked up in swapon(2) and swapoff(2).

MFC after: 2 weeks
Obtained from: TrustedBSD Project


# 167086 27-Feb-2007 jhb

Use pause() rather than tsleep() on stack variables and function pointers.


# 166550 07-Feb-2007 jhb

- Move 'struct swdevt' back into swap_pager.h and expose it to userland.
- Restore support for fetching swap information from crash dumps via
kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps.

Reviewed by: arch@, phk
MFC after: 1 week


# 165809 05-Jan-2007 jhb

- Add a new function uma_zone_exhausted() to see if a zone is full.
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.

MFC after: 1 week
Reviwed by: alc


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163622 23-Oct-2006 alc

The page queues lock is no longer required by vm_page_wakeup().


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 161125 09-Aug-2006 alc

Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().


# 161005 05-Aug-2006 alc

Remove a stale comment.


# 160960 03-Aug-2006 alc

When sleeping on a busy page, use the lock from the containing object
rather than the global page queues lock.


# 158387 10-May-2006 pjd

Use better order here.


# 157628 10-Apr-2006 pjd

On shutdown try to turn off all swap devices. This way GEOM providers are
properly closed on shutdown.

Requested by: ru
Reviewed by: alc
MFC after: 2 weeks


# 156420 08-Mar-2006 imp

Remove leading __ from __(inline|const|signed|volatile). They are
obsolete. This should reduce diffs to NetBSD as well.


# 154929 27-Jan-2006 cognet

Make sure b_vp and b_bufobj are NULL before calling relpbuf(), as it asserts
they are. They should be NULL at this point, except if we're coming from
swapdev_strategy().
It should only affect the case where we're swapping directly on a file over
NFS.


# 150418 21-Sep-2005 cognet

Make sure we have a bufobj before calling bstrategy().
I'm not sure this is the right thing to do, but at least I don't panic
anymore when swapping on a NFS file without using md(4).

X-MFC after: proper review


# 148200 20-Jul-2005 alc

Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function. Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks


# 146459 20-May-2005 alc

Reduce the number of times that we acquire and release locks in
swap_pager_getpages().

MFC after: 1 week


# 146367 19-May-2005 alc

Remove calls to spl*().


# 146350 18-May-2005 alc

Revert revision 1.270: swp_pager_async_iodone() need not perform
VM_LOCK_GIANT().

Discussed with: jeff


# 145699 30-Apr-2005 jeff

- VM_LOCK_GIANT in the swap pager's iodone routine as VFS will soon call it
without Giant.

Sponsored by: Isilon Systems, Inc.


# 145584 27-Apr-2005 jeff

- Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.


# 143821 18-Mar-2005 das

Move the swap_zone == NULL check earlier (i.e. before we dereference
the pointer.)

Found by: Coverity Prevent analysis tool


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 139629 03-Jan-2005 phk

When allocating bio's in the swap_pager use M_WAITOK since the
alternative is much worse.


# 137910 20-Nov-2004 das

Disable U area swapping and remove the routines that create, destroy,
copy, and swap U areas.

Reviewed by: arch@


# 137299 06-Nov-2004 das

Fix the last known race in swapoff(), which could lead to a spurious panic:

swapoff: failed to locate %d swap blocks

The race occurred because putpages() can block between the time it
allocates swap space and the time it updates the swap metadata to
associate that space with a vm_object, so swapoff() would complain
about the temporary inconsistency. I hoped to fix this by making
swp_pager_getswapspace() and swp_pager_meta_build() a single atomic
operation, but that proved to be inconvenient. With this change,
swapoff() simply doesn't attempt to be so clever about detecting when
all the pageout activity to the target device should have drained.


# 137239 05-Nov-2004 das

Close a race in swapoff(). Here are the gory details:

In order to avoid livelock, swapoff() skips over objects with a
nonzero pip count and makes another pass if necessary. Since it is
impossible to know which objects we care about, it would choose an
arbitrary object with a nonzero pip count and wait for it before
making another pass, the theory being that this object would finish
paging about as quickly as the ones we care about. Unfortunately,
we may have slept since we acquired a reference to this object.
Hack around this problem by tsleep()ing on the pointer anyway, but
timeout after a fixed interval. More elegant solutions are possible,
but the ones I considered unnecessarily complicate this rare case.

Also, kill some nits that seem to have crept into the swapoff() code
in the last 75 revisions or so:

- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since
the latter can be derived from the former.

- Replace swp_pager_find_dev() with something simpler. There's no
need to iterate over the entire list of swap devices just to determine
if a given block is assigned to the one we're interested in.

- Expand the scope of the swhash_mtx in a couple of places so that it
isn't released and reacquired once for every hash bucket.

- Don't drop the swhash_mtx while holding a reference to an object.
We need to lock the object first. Unfortunately, doing so would
violate the established lock order, so use VM_OBJECT_TRYLOCK() and
try again on a subsequent pass if the object is already locked.

- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.


# 137191 04-Nov-2004 phk

De-couple our I/O bio request from the embedded bio in buf by explicitly
copying the fields.


# 137186 04-Nov-2004 phk

Remove buf->b_dev field.


# 136927 24-Oct-2004 phk

Move the buffer method vector (buf->b_op) to the bufobj.

Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().


# 136767 22-Oct-2004 phk

Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.

Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.


# 136751 21-Oct-2004 phk

Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT

Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj. Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.


# 135746 24-Sep-2004 das

Don't look for swap blocks in objects that aren't swap-backed.
I expect that this will fix the following panic, reported by Jun:
swap_pager_isswapped: failed to locate all swap meta blocks

MT5 candidate


# 133318 08-Aug-2004 phk

Tag all geom classes in the tree with a version number.


# 132550 22-Jul-2004 alc

- Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of
kmem_alloc_pageable(). The difference between these is that an errant
memory access to the zone will be detected sooner with
kmem_alloc_nofault().

The following changes serve to eliminate the following lock-order
reversal reported by witness:

1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311
2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797
3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931

There is no potential deadlock in this case. However, witness is unable
to recognize this because vm objects used by UMA have the same type as
ordinary vm objects. To remedy this, we make the following changes:

- Add a mutex type argument to VM_OBJECT_LOCK_INIT().
- Use the mutex type argument to assign distinct types to special
vm objects such as the kernel object, kmem object, and UMA objects.
- Define a static swap zone object for use by UMA. (Only static
objects are assigned a special mutex type.)


# 131665 06-Jul-2004 bms

Properly brucify a string by outdenting it.


# 130979 23-Jun-2004 bms

In swap_pager_getpages(), bp->b_dev can be NULL, particularly for the
case of NFS mounted swap, so do not try to dereference it.

While we're here, brucify the printf() call which happens when we
time out on acquisition of vm_page_queue_mtx.

PR: kern/67898
Submitted by: bde (style)


# 130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 128992 06-May-2004 alc

Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation. This flag's principal use is shortly after
allocation. For such cases, clearing the flag is pointless. The only
unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never
requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by: tegge@


# 126135 23-Feb-2004 alc

- Substitute bdone() and bwait() from vfs_bio.c for
swap_pager_putpages()'s buffer completion code. Note: the only
difference between swp_pager_sync_iodone() and bdone(), aside from
the locking in the latter, was the unnecessary clearing of B_ASYNC.
- Remove an unnecessary pmap_page_protect() from
swp_pager_async_iodone().

Reviewed by: tegge


# 125755 12-Feb-2004 phk

Remove the absolute count g_access_abs() function since experience has
shown that it is not useful.

Rename the relative count g_access_rel() function to g_access(), only
the name has changed.

Change all g_access_rel() calls in our CVS tree to call g_access() instead.

Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.


# 125558 07-Feb-2004 alc

swp_pager_async_iodone() no longer requires Giant. Modify bufdone()
and swapgeom_done() to perform swp_pager_async_iodone() without Giant.

Reviewed by: tegge


# 125322 02-Feb-2004 phk

Check error return from g_clone_bio(). (netchild@)

Add XXX comment about why this is still not optimal. (phk@)

Submitted by: netchild@


# 124933 24-Jan-2004 alc

1. Statically initialize swap_pager_full and swap_pager_almost_full to the
full state. (When swap is added their state will change appropriately.)
2. Set swap_pager_full and swap_pager_almost_full to the full state when
the last swap device is removed.
Combined these changes eliminate nonsense messages from the kernel on swap-
less machines.

Item 2 submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Prodding by: phk


# 124133 04-Jan-2004 alc

Simplify the various pager allocation routines by computing the desired
object size once and assigning that value to a local variable.


# 124110 03-Jan-2004 alc

Reduce the scope of Giant in swap_pager_alloc().


# 123948 29-Dec-2003 alc

Remove swap_pager_un_object_list; it is unused.


# 121854 01-Nov-2003 alc

- Modify swap_pager_copy() and its callers such that the source and
destination objects are locked on entry and exit. Add comments to
the callers noting that the locks can be released by swap_pager_copy().
- Remove several instances of GIANT_REQUIRED.


# 121782 31-Oct-2003 alc

- Synchronize access to the swdevt's sw_flags with sw_dev_mtx.
- Remove several instances of GIANT_REQUIRED.


# 121727 30-Oct-2003 alc

- Synchronize access to the swdevt's sw_blist with sw_dev_mtx.
- Remove several instances of GIANT_REQUIRED.


# 121725 30-Oct-2003 alc

- Synchronize access to swdevhd using sw_dev_mtx.
- Use swp_sizecheck() rather than assignment to swap_pager_full in
swaponsomething().


# 121649 29-Oct-2003 alc

- Synchronize updates to nswapdev using sw_dev_mtx.


# 121646 29-Oct-2003 alc

- Avoid a race in swaponsomething(): Calculate the new swdevt's first and
end swblk and insert this new swdevt into the list of swap devices
in the same critical section.


# 121601 27-Oct-2003 alc

- Complete the synchronization of accesses to the swblock hash table.


# 121583 26-Oct-2003 alc

- Introduce and use a mutex synchronizing access to the swblock hash table.


# 121517 25-Oct-2003 alc

- Add some of the required vm object locking, including assertions where
the vm object lock is required and already held.


# 121455 24-Oct-2003 alc

- Push down Giant from vm_pageout() to vm_pageout_scan(), freeing
vm_pageout_page_stats() from Giant.
- Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
vm object to be locked on entry. (All of the pager routines now expect
this.)


# 121205 18-Oct-2003 phk

DuH!

bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)


# 121199 18-Oct-2003 phk

Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY().
Remove stale comment about B_PHYS.


# 119663 02-Sep-2003 phk

Don't open with exclusive bit, swapon(8) wants to trash our swapdev.

Add XXX comment with a rating of this concept.


# 119591 30-Aug-2003 phk

Add a close() method to a swapdev.

Add a GEOM based backend.

Remove the device/VOP_SPECSTRATEGY() based backend.


# 119590 30-Aug-2003 phk

Protect the swapdevice tailq with a mutex.

Store the udev_t we will report to userland in the swdevt.


# 119575 30-Aug-2003 phk

Continue the objectification of the swapdev backends:

Remove the vnode and dev_t fields and replace them with a void *.

Introduce separate strategy functions for devices and regular (NFS)
vnodes.

For devices we don't need the vnode v_numoutput stuff.

Add a generic swaponsomething() function to add a swapdevice and
split the remainder of swaponvp() into swaponvp() and swapondev()
which calls this backend.


# 119574 30-Aug-2003 phk

Make the strategy function a method of the individual swapdev.


# 119573 30-Aug-2003 phk

Consistent use modern function definitions


# 118946 15-Aug-2003 phk

Eliminate unnecessary udev_t variable: we can derive it from the dev_t
when we need it.


# 118945 15-Aug-2003 phk

Make swaponvp() static to the swap_pager.


# 118544 06-Aug-2003 phk

Make the first two pages magic to protect the BSD labels rather than
only one.


# 118536 06-Aug-2003 phk

Staticize swap_pager_putpages()

Eliminate a lot of checkes to make sure requests are not cross-device
which is unnecessary with the new layout. We know a sequential request
cannot possibly be cross-device because there is a reserved page between
the devices.

Remove a couple of comments which no longer are relevant.


# 118527 06-Aug-2003 phk

Explicitly set B_PAGING


# 118521 06-Aug-2003 phk

Rip out the totally bogos vnode swapdev_vp with extreeme prejudice.

Don't mark buffers with B_KEEPGIANT, we don't drop giant in strategy
at this point in time.


# 118468 05-Aug-2003 phk

Use sparse struct initialization for struct pagerops.

Mark our buffers B_KEEPGIANT before sending them downstream.

Remove swap_pager_strategy implementation.


# 118418 04-Aug-2003 phk

Put an uncovered page between the swap devices, that way we can be sure
to not get any cross-device I/O requests. (The unallocated first page
protecting BSD labels already gave us this, but that hack may go away
at some point in time).

Remove the check for cross-device I/O requests in swap_pager_strategy.

Move the repeated statistics updating into flushchainbuf().


# 118398 03-Aug-2003 phk

Name swap_pager_find_dev() more correctly swp_pager_finde_dev().

Use ->bio_children to count child buffers, rather than abuse the
bio_caller1 pointer.

Expand the relevant bits of waitchainbuf() inline, this clarifies
the code a little bit.


# 118392 03-Aug-2003 phk

I accidentally hit undo before committing, fix the resulting off-by-one.


# 118390 03-Aug-2003 phk

Change the layout policy of the swap_pager from a hardcoded width
striping to a per device round-robin algorithm.

Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.

Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist. This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.

Remove the compile time constant limit of number of swap devices,
there is no limit now. Instead of a fixed size array, store the
per swapdev structure in a TAILQ.

Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).

We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.

A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).

The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.

Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.

Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.

Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.


# 118286 31-Jul-2003 phk

Remove unused stuff.

Move used stuff to swap_pager.c where it belongs.

This file no longer exports anything to userland.


# 118047 26-Jul-2003 phk

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


# 117903 22-Jul-2003 phk

Remove all but one of the inlines here, this reduces the code size by
2032 bytes and has no measurable impact on performance.


# 117866 22-Jul-2003 peter

swp_pager_hash() was called before it was instantiated inline. This made
gcc (quite rightly) unhappy. Move it earlier.


# 117747 18-Jul-2003 phk

Fix a printf format warning I introduced.
Use the macro max number of swap devices rather than cache the constant
in a variable.
Avoid a (now) pointless variable.


# 117725 18-Jul-2003 phk

If a proposed swap device exceeds the 8G artificial limit which out
radix-tree code imposes, truncate the device instead of rejecting it.


# 117724 18-Jul-2003 phk

Move the implementation of the vmspace_swap_count() (used only in
the "toss the largest process" emergency handling) from vm_map.c to
swap_pager.c.

The quantity calculated depends strongly on the internals of the
swap_pager and by moving it, we no longer need to expose the
internal metrics of the swap_pager to the world.


# 117723 18-Jul-2003 phk

Add a new function swap_pager_status() which reports the total size of the
paging space and how much of it is in use (in pages).

Use this interface from the Linuxolator instead of groping around in the
internals of the swap_pager.


# 117722 18-Jul-2003 phk

Merge swap_pager.c and vm_swap.c into swap_pager.c, the separation
is not natural and needlessly exposes a lot of dirty laundry.

Move private interfaces between the two from swap_pager.h to swap_pager.c
and staticize as much as possible.

No functional change.


# 117702 17-Jul-2003 phk

Make sure that SWP_NPAGES always has the same value in all source
files, so that SWAP_META_PAGES does not vary either.

swap_pager.c ended up with a value of 16, everybody else 8. Go with
the 16 for now.

This should only have any effect in the "kill processes because we
are out of swap" scenario, where it will make some sort of estimate
of something more precise.


# 116798 25-Jun-2003 alc

Maintain the lock on a vm object when calling vm_page_grab().


# 116629 20-Jun-2003 alc

Make swap_pager_haspages() static; remove unused function prototypes.


# 116437 16-Jun-2003 phk

This file was ignored by CVS in my last commit for some reason:

Remove pointless initialization of b_spc field, which now no longer
exists.


# 116280 13-Jun-2003 alc

Extend the scope of the vm object lock in swp_pager_async_iodone() to cover
a vm_page_free().


# 116279 13-Jun-2003 alc

Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.


# 116226 11-Jun-2003 obrien

Use __FBSDID().


# 115987 07-Jun-2003 alc

Assert that the vm object is locked on entry to swap_pager_freespace().


# 114774 06-May-2003 alc

Lock the vm_object when performing vm_pager_deallocate().


# 114166 28-Apr-2003 alc

- Lock the vm_object when performing swap_pager_isswapped().
- Assert that the vm_object is locked in swap_pager_isswapped().


# 114074 26-Apr-2003 alc

- Convert vm_object_pip_wait() from using tsleep() to msleep().
- Make vm_object_pip_sleep() static.
- Lock the vm_object when performing vm_object_pip_wait().


# 113744 20-Apr-2003 alc

- Lock the vm_object when performing vm_object_pip_add().
- Remove an unnecessary variable.


# 113739 20-Apr-2003 alc

- Lock the vm_object when performing vm_object_pip_add().


# 113722 19-Apr-2003 alc

- Lock the vm_object when performing vm_object_pip_subtract().
- Assert that the vm_object lock is held in vm_object_pip_subtract().


# 113721 19-Apr-2003 alc

- Lock the vm_object when performing vm_object_pip_wakeupn().
- Assert that the vm_object lock is held in vm_object_pip_wakeupn().
- Add a new macro VM_OBJECT_LOCK_ASSERT().


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 108600 03-Jan-2003 phk

Avoid extern decls in .c files by putting them in the vm/swap_pager.h
include file where they belong.
Share the dmmax_mask variable.


# 108589 03-Jan-2003 phk

Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.


# 108011 18-Dec-2002 alc

Hold the page queues lock when performing vm_page_flag_set().


# 107913 15-Dec-2002 dillon

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


# 107039 18-Nov-2002 alc

Remove vm_page_protect(). Instead, use pmap_page_protect() directly.


# 106778 11-Nov-2002 cognet

Remove extra #include<sys/vmmeter.h>.


# 104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 103924 24-Sep-2002 jeff

- Lock access to numoutput on the swap devices.


# 102738 31-Aug-2002 dillon

Reduce the maximum KVA reserved for swap meta structures from 70 to 32 MB.
Reduce the swap meta calculation by a factor of 2, it's still massive overkill.

X-MFC after: immediately


# 100452 21-Jul-2002 alc

o Lock page queue accesses by vm_page_free().


# 100415 20-Jul-2002 alc

o Lock page queue accesses by vm_page_try_to_cache(). (The accesses
in kern/vfs_bio.c are already locked.)
o Assert that the page queues lock is held in vm_page_try_to_cache().


# 98892 26-Jun-2002 iedowse

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


# 98891 26-Jun-2002 iedowse

Use an explicit cast to avoid relying on sign extension to do the
right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'.

Reviewed by: dillon


# 98607 22-Jun-2002 alc

o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition and
release of Giant. (Annotate as MPSAFE.)


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 92748 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


# 92727 19-Mar-2002 alfred

Remove __P.


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 92029 10-Mar-2002 eivind

- Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by: alc


# 91420 27-Feb-2002 jhb

Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic
and isn't strictly required. However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.


# 91063 22-Feb-2002 phk

GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.


# 85016 15-Oct-2001 tegge

Don't use an uninitialized field reserved for callers in the bio structure
passed to swap_pager_strategy(). Instead, use a field reserved for drivers
and initialize it before usage.

Reviewed by: dillon


# 84827 11-Oct-2001 jhb

Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
refcount is > 1 and false (0) otherwise.


# 81933 19-Aug-2001 dillon

Limit the amount of KVM reserved for the buffer cache and for swap-meta
information. The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache. This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating. The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.


# 81029 02-Aug-2001 alfred

Fixups for the initial allocation by dillon:
1) allocate fewer buckets
2) when failing to allocate swap zone, keep reducing the zone by
a third rather than a half in order to reduce the chance of
allocating way too little.

I also moved around some code for readability.

Suggested by: dillon
Reviewed by: dillon


# 79242 04-Jul-2001 dillon

whitespace / register cleanup


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 78622 22-Jun-2001 jhb

- Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex.
- Don't drop the vm mutex while grabbing the pbuf mutex to manipulate
said variables.


# 77088 23-May-2001 jhb

- Fix the sw_alloc_interlock to actually lock itself when the lock is
acquired.
- Assert Giant is held in the strategy, getpages, and putpages methods and
the getchainbuf, flushchainbuf, and waitchainbuf functions.
- Always call flushchainbuf() w/o the VM lock.


# 77036 23-May-2001 alfred

aquire Giant when playing with the buffercache and doing IO.
use msleep against the vm mutex while waiting for a page IO to complete.


# 77010 22-May-2001 alfred

aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone
with the mutex held.


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76322 06-May-2001 phk

Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.


# 75675 18-Apr-2001 alfred

Protect pager object creation with sx locks.

Protect pager object list manipulation with a mutex.

It doesn't look possible to combine them under a single sx lock because
creation may block and we can't have the object list manipulation block
on anything other than a mutex because of interrupt requests.


# 75474 13-Apr-2001 alfred

protect pbufs and associated counts with a mutex


# 72949 23-Feb-2001 rwatson

Introduce per-swap area accounting in the VM system, and export
this information via the vm.nswapdev sysctl (number of swap areas)
and vm.swapdevX nodes (where X is the device), which contain the MIBs
dev, blocks, used, and flags. These changes are required to allow
top and other userland swap-monitoring utilities to run without
setgid kmem.

Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: freebsd-audit


# 69972 13-Dec-2000 tanimura

- If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by: dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by: alfred, dillon


# 69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# 68921 19-Nov-2000 rwatson

o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT.
This removes a reason that systat requires setgid kmem. More to
come.


# 68885 18-Nov-2000 dillon

Implement a low-memory deadlock solution.

Removed most of the hacks that were trying to deal with low-memory
situations prior to now.

The new code is based on the concept that I/O must be able to function in
a low memory situation. All major modules related to I/O (except
networking) have been adjusted to allow allocation out of the system
reserve memory pool. These modules now detect a low memory situation but
rather then block they instead continue to operate, then return resources
to the memory pool instead of cache them or leave them wired.

Code has been added to stall in a low-memory situation prior to a vnode
being locked.

Thus situations where a process blocks in a low-memory condition while
holding a locked vnode have been reduced to near nothing. Not only will
I/O continue to operate, but many prior deadlock conditions simply no
longer exist.

Implement a number of VFS/BIO fixes

(found by Ian): in biodone(), bogus-page replacement code, the loop
was not properly incrementing loop variables prior to a continue
statement. We do not believe this code can be hit anyway but we
aren't taking any chances. We'll turn the whole section into a
panic (as it already is in brelse()) after the release is rolled.

In biodone(), the foff calculation was incorrectly
clamped to the iosize, causing the wrong foff to be calculated
for pages in the case of an I/O error or biodone() called without
initiating I/O. The problem always caused a panic before. Now it
doesn't. The problem is mainly an issue with NFS.

Fixed casts for ~PAGE_MASK. This code worked properly before only
because the calculations use signed arithmatic. Better to properly
extend PAGE_MASK first before inverting it for the 64 bit masking
op.

In brelse(), the bogus_page fixup code was improperly throwing
away the original contents of 'm' when it did the j-loop to
fix the bogus pages. The result was that it would potentially
invalidate parts of the *WRONG* page(!), leading to corruption.

There may still be cases where a background bitmap write is
being duplicated, causing potential corruption. We have identified
a potentially serious bug related to this but the fix is still TBD.
So instead this patch contains a KASSERT to detect the problem
and panic the machine rather then continue to corrupt the filesystem.
The problem does not occur very often.. it is very hard to
reproduce, and it may or may not be the cause of the corruption
people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>


# 68883 18-Nov-2000 dillon

This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
array. However, with the advent of rfork() the file descriptor table
could be shared between processes. This patch closes over a dozen
serious race conditions related to one thread manipulating the table
(e.g. closing or dup()ing a descriptor) while another is blocked in
an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>


# 67082 13-Oct-2000 dillon

The swap bitmap allocator was not calculating the bitmap size properly
in the face of non-stripe-aligned swap areas. The bug could cause a
panic during boot.

Refuse to configure a swap area that is too large (67 GB or so)

Properly document the power-of-2 requirement for SWB_NPAGES.

The patch is slightly different then the one Tor enclosed in the P.R.,
but accomplishes the same thing.

PR: kern/20273
Submitted by: Tor.Egge@fast.no


# 60755 21-May-2000 peter

Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by: dillon


# 60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 59915 03-May-2000 phk

Convert the vm_pager_strategy() interface to take a struct bio instead of
a struct buf. Don't try to examine B_ASYNC, it is a layering violation
to do so. The only current user of this interface is vn(4) which, since
it emulates a disk interface, operates on struct bio already.


# 59866 01-May-2000 phk

Move and staticize the bufchain functions so they become local to the
only piece of code using them. This will ease a rewrite of them.


# 59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


# 58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


# 58714 28-Mar-2000 dillon

Misattribution - the excellent SPLASSERT work is being done by
Paul Saab <paul@mu.org>, of course!


# 58708 27-Mar-2000 dillon

Add necessary spl protection for swapper. The problem was located by
Alfred while testing his SPLASSERT stuff. This is not a complete fix,
more protections are probably needed.


# 58705 27-Mar-2000 charnier

Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce


# 58634 26-Mar-2000 charnier

Spelling


# 58462 22-Mar-2000 phk

Fix one place which knew that B_WRITE was zero.

Fix a stylistic mistake of mine while here.

Found by: Stephen Hocking <shocking@prth.pgs.com>


# 58349 20-Mar-2000 phk

Rename the existing BUF_STRATEGY() to DEV_STRATEGY()

substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.


# 58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


# 58132 16-Mar-2000 phk

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


# 55175 28-Dec-1999 peter

Fix the swap backed vn case - this was broken by my rev 1.128 to
swap_pager.c and related commits.

Essentially swap_pager.c is backed out to before the changes, but
swapdev_vp is converted into a real vnode with just VOP_STRATEGY().
It no longer abuses specfs vnops and no longer needs a dev_t and
/dev/drum (or /dev/swapdev) for the intermediate layer.

This essentially restores the vnode interface as the interface to the
bottom of the swap pager, and vm_swap.c provides a clean vnode interface.

This will need to be revisited when we swap to files (vnodes) - which
is the other reason for keeping the vnode interface between the swap pager
and the swap devices.

OK'ed by: dillon


# 53594 22-Nov-1999 phk

Isolate the swapdev_vp "not quite" vnode in the only source file which
needs it now that /dev/drum is gone.

Reviewed by: eivind, peter


# 53338 18-Nov-1999 peter

Remove the non-functional "swap device" userland front-end to the
multiplexed underlying swap devices (/dev/drum). The only thing it did
was to allow root to open /dev/drum, but not do anything with it.
Various utilities used to grovel around in here, but Matt has written
a much nicer (and clean) front-end to this for libkvm, and nothing uses
the old system any more.

The VM system was calling VOP_STRATEGY() on the vp of the first underlying
swap device (not the /dev/drum one, the first real device), and using
the VOP system to indirectly (and only) call swstrategy() to choose
an underlying device and enqueue it on that device. I have changed it
to avoid diverting through the VOP system and to call the only possible
target directly, saving a little bit of time and some complexity.

In all, nothing much changes, except some scaffolding to support the
roundabout way of calling swstrategy() is gone.

Matt gave me the ok to do this some time ago, and I apologize for taking
so long to get around to it.


# 52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 51339 17-Sep-1999 dillon

Fix a number of spl bugs related to reserving and freeing swap space.
Swap space can be freed from an interrupt and so swap reservation and
freeing must occur at splvm.

Add swap_pager_reserve() code to support a new swap pre-reservation
capability for the VN device.

Generally cleanup the swap code by simplifying the swp_pager_meta_build()
static function and consolidating the SWAPBLK_NONE test from a bit test
to an absolute compare. The bit test was left over from a rejected
swap allocation scheme that was not ultimately committed. A few other
minor cleanups were also made.

Reorganize the swap strategy code, again for VN support, to not
reallocate swap when writing as this messes up pre-reservation and
can fragment I/O unnecessarily as VN-baesd disk is messed around with.

Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50269 23-Aug-1999 bde

Use devtoname to print dev_t's instead of casting them to u_long for
misprinting with %lx.

Cast pointers to intptr_t instead of casting them to long. Cosmetic.


# 49949 17-Aug-1999 alc

Correct an accidental omission of one "vm_page_undirty" replacement
from the previous commit.


# 49945 17-Aug-1999 alc

Add the (inline) function vm_page_undirty for clearing the dirty bitmask
of a vm_page.

Use it.

Submitted by: dillon


# 48833 16-Jul-1999 alc

Remove vm_object::last_read. It is used by the old swap pager, but
not by the new one, i.e., vm/swap_pager.c rev 1.108.

Reviewed by: dillon@backplane.com


# 48289 27-Jun-1999 peter

Kirk missed a required BUF_KERNPROC(). Even though this is a non-async
transfer, the b_iodone hook causes biodone() to release it from interrupt
context.


# 48225 26-Jun-1999 mckusick

Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.


# 46580 06-May-1999 phk

remove b_proc from struct buf, it's (now) unused.

Reviewed by: dillon, bde


# 44739 14-Mar-1999 julian

Submitted by: Matt Dillon <dillon@freebsd.org>
The old VN device broke in -4.x when the definition of B_PAGING
changed. This patch fixes this plus implements additional capabilities.
The new VN device can be backed by a file ( as per normal ), or it can
be directly backed by swap.

Due to dependencies in VM include files (on opt_xxx options) the new
vn device cannot be a module yet. This will be fixed in a later commit.
This commit delimitted by tags {PRE,POST}_MATT_VNDEV


# 44179 21-Feb-1999 dillon

Remove conditional sysctl's

Leave swap_async_max sysctl intact, remove swap_cluster_max sysctl.

Reviewed by: Alan Cox <alc@cs.rice.edu>


# 44178 21-Feb-1999 dillon

Reviewed by: Alan Cox <alc@cs.rice.edu>

Fix problem w/ low-swap/low-memory handling as reported by Bruce Evans.


# 44124 18-Feb-1999 dillon

Limit number of simultanious asynchronous swap pager I/Os that can
be in progress at any given moment.

Add two swap tuneables to sysctl:

vm.swap_async_max: 4
vm.swap_cluster_max: 16

Recommended values are a cluster size of 8 or 16 pages. async_max is
about right for 1-4 swap devices. Reduce to 2 if swap is eating too much
bandwidth, or even 1 if swap is both eating too much bandwidth and sitting
on a slow network (10BaseT).

The defaults work well across a broad range of configurations and should
normally be left alone.


# 43700 06-Feb-1999 dillon

Add hysteresis to the 'swap_pager_getswapspace; failed' console message.
Also widen the hysteresis levels a little ( these really should be
dynamically configured ).


# 43287 27-Jan-1999 dillon

Remove unintended trigraph sequences in comments for -Wall


# 43138 24-Jan-1999 dillon

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


# 43129 24-Jan-1999 dillon

vm_pager_put_pages() is passed an rcval array to hold per-page return
values. The 'int' return value for the procedure was never used and
not well defined in any case when there are mixed errors on pages, so
it has been removed. vm_pager_put_pages() and associated vm_pager
functions now return void.


# 42966 21-Jan-1999 dillon

The default_pager's interaction with the swap_pager has been reorganized,
and the swap_pager has been completely replaced.

The new swap pager uses the new blist radix-tree based bitmap allocator
for low level swap allocation and deallocation. The new allocator
is effectively O(5) while the old one was O(N), and the new allocator
allocates all required memory at init time rather then at allocate
memory on the fly at run time.

Swap metadata is allocated in clusters and stored in a hash table,
eliminating linearly allocated structures.

Many, many features have been rewritten or added. Swap space is now
reallocated on the fly providing a poor-mans auto defragmentation of
swap space. Swap space that is no longer needed is freed on a timely
basis so no garbage collection is necessary.

Swap I/O is marked B_ASYNC and NFS has been fixed to do the right
thing with it, so NFS-based paging now has around 10x the performance
as it did before ( previously NFS enforced synchronous I/O for paging ).


# 42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 42453 09-Jan-1999 eivind

KNFize, by bde.


# 42408 08-Jan-1999 eivind

Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by: msmith


# 42153 29-Dec-1998 dt

Don't free swap in swap_pager_getpages(): this code probably cause the
"dying daemons" problem. (I thought this code was introduced in rev.1.80,
but it just relaxed the condition.)

Also, kill related "suggest more swap space" warning (also introduced in
1.80). It was confusing, to say the least...

Requested by: msmith
Not objected by: dg


# 41250 19-Nov-1998 bde

Fixed a null pointer panic in spc_free(). swap_pager_putpages()
almost always causes this panic for the curproc != pageproc case.
This case apparently doesn't happen in normal operation, but it
happens when vm_page_alloc_contig() is called when there is a memory
hogging application that hasn't already been paged out.

PR: 8632
Reviewed by: info@opensound.com (Dev Mazumdar), dg
Broken in: rev.1.89 (1998/02/23)


# 40790 31-Oct-1998 peter

Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.


# 40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


# 38799 04-Sep-1998 dfr

Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.


# 38517 24-Aug-1998 dfr

Change various syscalls to use size_t arguments instead of u_int.

Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>


# 38298 13-Aug-1998 dfr

Protect all modifications to paging_in_progress with splvm().


# 37918 28-Jul-1998 bde

Fixed two spl nesting bugs. They caused (at least) the entire pageout
daemon to run at splvm() forever after swap_pager_putpages() is called
from vm_pageout_scan().

Broken in: rev.1.189 (1998/02/23)


# 37555 11-Jul-1998 bde

Fixed printf format errors.


# 37384 04-Jul-1998 julian

VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>


# 35669 04-May-1998 dyson

Work around some VM bugs, the worst being an overly aggressive
swap space free calculation. More complete fixes will be forthcoming,
in a week.


# 35497 29-Apr-1998 dyson

Tighten up management of memory and swap space during map allocation,
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.


# 35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


# 34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


# 33936 01-Mar-1998 dyson

1) Use a more consistent page wait methodology.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."


# 33817 25-Feb-1998 dyson

Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs. The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by: Partially from Tor Egge.


# 33758 23-Feb-1998 dyson

Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time. The swap pager rundown code
has been clean-up, and unneeded wakeups removed. Lots of splbio's
are changed to splvm's. Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)


# 33181 09-Feb-1998 eivind

Staticize.


# 33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


# 33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


# 33034 02-Feb-1998 dyson

This fix should help the panic problems in -current. There
were some errors in "interval" management. Due to the
clustering mechanism, the code is necessarily complex and
error prone.


# 32952 01-Feb-1998 dyson

Fix a performance problem caused by an earlier commit.


# 32937 31-Jan-1998 dyson

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


# 32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


# 31970 24-Dec-1997 dyson

Support running with inadequate swap space. Additionally, the code
will complain with a suggestion of increasing it.


# 31493 02-Dec-1997 phk

In all such uses of struct buf: 's/b_un.b_addr/b_data/g'


# 28992 01-Sep-1997 bde

Removed unused #includes.


# 28990 01-Sep-1997 bde

Print a device number in hex instead of decimal.


# 28751 25-Aug-1997 bde

Fixed type mismatches for functions with args of type vm_prot_t and/or
vm_inherit_t. These types are smaller than ints, so the prototypes
should have used the promoted type (int) to match the old-style function
definitions. They use just vm_prot_t and/or vm_inherit_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change the small types to u_int, to optimize
for time instead of space.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21529 11-Jan-1997 dyson

Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.) Upper level recoded to use
pmap_ts_referenced.


# 18893 12-Oct-1996 bde

Removed __pure's and __pure2's. __pure is a no-op for recent versions
of gcc by definition, and __pure2 is a no-op in effect (presumably the
compiler can see when an inline function has no side effects).


# 18169 08-Sep-1996 dyson

Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)


# 17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


# 17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


# 16274 10-Jun-1996 dyson

Mostly superficial code improvements, add a diagnostic. The
code improvements include significant simplification of the reservation
of the swap pager control blocks for reads. Add a panic for an inconsistent
swap pager control block count.


# 15873 22-May-1996 dyson

Initial support for MADV_FREE, support for pages that we don't care
about the contents anymore. This gives us alot of the advantage of
freeing individual pages through munmap, but with almost none of the
overhead.


# 15809 18-May-1996 dyson

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


# 15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


# 15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 14396 06-Mar-1996 dyson

Fix a problem in the swap pager that caused some of the pages that
were paged in under low swap space conditions to both loose their
backing store and their dirty bits. This would cause pages to
be demand zeroed under certain conditions in low VM space conditions
and consequential sig-11's or sig-10's. This situation was made
worse lately when the level for swap space reclaim threshold was
increased.


# 14364 03-Mar-1996 dyson

In order to fix some concurrency problems with the swap pager early
on in the FreeBSD development, I had made a global lock around the
rlist code. This was bogus, and now the lock is maintained on a
per resource list basis. This now allows the rlist code to be used for
almost any non-interrupt level application.


# 14316 02-Mar-1996 dyson

1) Eliminate unnecessary bzero of UPAGES.
2) Eliminate unnecessary copying of pages during/after forks.
3) Add user map simplification.


# 13790 31-Jan-1996 dg

"out of space" -> "out of swap space".


# 13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


# 12820 14-Dec-1995 phk

Another mega commit to staticize things.


# 12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


# 12779 11-Dec-1995 dyson

Some new anti-deadlock code ended up messing up the paging stats. A modified
version of the code is now in place, and gausspage performance is back
up to where it should be.


# 12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12591 03-Dec-1995 bde

Completed function declarations and/or added prototypes.

Staticized some functions.

__purified some functions. Some functions were bogusly declared as
returning `const'. This hasn't done anything since gcc-2.5. For
later versions of gcc, the equivalent is __attribute__((const)) at
the end of function declarations.


# 12423 20-Nov-1995 phk

Remove unused vars & funcs, make things static, protoize a little bit.


# 12325 16-Nov-1995 bde

Fixed recent staticizations. Some protypes for static functions were
left in headers and not staticized.


# 12300 14-Nov-1995 phk

staticize.


# 12006 02-Nov-1995 dg

Move page fixups (pmap_clear_modify, etc) that happen after paging input
completes out of vm_fault and into the pagers. This get rid of some
redundancy and improves the architecture.

Reviewed by: John Dyson <dyson>


# 10984 24-Sep-1995 dg

Check that the swap block is valid before including it in a cluster.

Submitted by: John Dyson


# 10670 10-Sep-1995 dyson

Make sure that the prezero flag is cleared when needed.


# 10579 06-Sep-1995 dyson

Fixed a sign reversal problem -- might have cause some Sig-11s that
people have been seeing.


# 10556 04-Sep-1995 dyson

Allow the fault code to use additional clustering info from both
bmap and the swap pager. Improved fault clustering performance.


# 9548 16-Jul-1995 dg

1) Merged swpager structure into vm_object.
2) Changed swap_pager internal interfaces to cope w/#1.
3) Eliminated object->copy as we no longer have copy objects.
4) Minor stylistic changes.


# 9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8585 18-May-1995 dg

Accessing pages beyond the end of a mapped file results in internal
inconsistencies in the VM system that eventually lead to a panic. These
changes fix the behavior to conform to the behavior in SunOS, which is
to deny faults to pages beyond the EOF (returning SIGBUS). Internally,
this is implemented by requiring faults to be within the object size
boundaries. These changes exposed another bug, namely that passing in
an offset to mmap when trying to map an unnamed anonymous region also
results in internal inconsistencies. In this case, the offset is forced
to zero.

Reviewed by: John Dyson and others


# 8504 14-May-1995 dg

Changed swap partition handling/allocation so that it doesn't
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).

From Poul-Henning:

The visible effect is this:

As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.

There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.

The invisible effect is that:

Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.

Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.


# 8416 10-May-1995 dg

Changed "handle" from type caddr_t to void *; "handle" is several different
types of pointers, and "char *" is a bad choice for the type.


# 8319 07-May-1995 dyson

Another error in the correction for trimming swap allocation for
small objects. (This code needs to be revisited.)


# 8315 07-May-1995 dyson

Fixed a calculation that would once-in-a-while cause the swap_pager
to emit spurious page outside of object type messages. It is not
a fatal condition anyway, so the message will be omitted for
release. Also, the code that "clips" the allocation size, associated
with the above problem, was fixed.


# 7935 19-Apr-1995 dg

New flag: B_PAGING. Added as part of the vn driver hack.


# 7887 16-Apr-1995 dg

Removed obsolete/unused variable declarations.
Removed some extern declarations and included the correct include files.


# 7883 16-Apr-1995 dg

Moved some zero-initialized variables into .bss. Made code intended to be
called only from DDB #ifdef DDB. Removed some completely unused globals.


# 7240 22-Mar-1995 dg

Added a check for wrong object size; print a warning, but deal with it
correctly. The warning will tell us that there is a bug somewhere else
in sizing the object correctly.

Submitted by: John Dyson


# 7170 19-Mar-1995 dg

Removed redundant newlines that were in some panic strings.


# 7007 11-Mar-1995 dg

Clear OBJ_INTERNAL flag for device pager objects and named anonymous
objects.


# 6816 01-Mar-1995 dg

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


# 6703 25-Feb-1995 dg

Fixed severely broken printf (arguments out of order, no newline).


# 6618 22-Feb-1995 dg

Only do object paging_in_progress wakeups if someone is waiting on this
condition.

Submitted by: John Dyson


# 6585 20-Feb-1995 dg

Deprecated remaining use of vm_deallocate. Deprecated vm_allocate_with_
pager(). Almost completely rewrote vm_mmap(); when John gets done with
the bottom half, it will be a complete rewrite. Deprecated most use of
vm_object_setpager(). Removed side effect of setting object persist
in vm_object_enter and moved this into the pager(s). A few other
cosmetic changes.


# 6129 02-Feb-1995 dg

swap_pager.c:
Fixed long standing bug in freeing swap space during object collapses.
Fixed 'out of space' messages from printing out too often.
Modified to use new kmem_malloc() calling convention.
Implemented an additional stat in the swap pager struct to count the
amount of space allocated to that pager. This may be removed at some
point in the future.
Minimized unnecessary wakeups.

vm_fault.c:
Don't try to collect fault stats on 'swapped' processes - there aren't
any upages to store the stats in.
Changed read-ahead policy (again!).

vm_glue.c:
Be sure to gain a reference to the process's map before swapping.
Be sure to lose it when done.

kern_malloc.c:
Added the ability to specify if allocations are at interrupt time or
are 'safe'; this affects what types of pages can be allocated.

vm_map.c:
Fixed a variety of map lock problems; there's still a lurking bug that
will eventually bite.

vm_object.c:
Explicitly initialize the object fields rather than bzeroing the struct.
Eliminated the 'rcollapse' code and folded it's functionality into the
"real" collapse routine.
Moved an object_unlock() so that the backing_object is protected in
the qcollapse routine.
Make sure nobody fools with the backing_object when we're destroying it.
Added some diagnostic code which can be called from the debugger that
looks through all the internal objects and makes certain that they
all belong to someone.

vm_page.c:
Fixed a rather serious logic bug that would result in random system
crashes. Changed pagedaemon wakeup policy (again!).

vm_pageout.c:
Removed unnecessary page rotations on the inactive queue.
Changed the number of pages to explicitly free to just free_reserved
level.

Submitted by: John Dyson


# 5841 24-Jan-1995 dg

Added ability to detect sequential faults and DTRT. (swap_pager.c)
Added hook for pmap_prefault() and use symbolic constant for new third
argument to vm_page_alloc() (vm_fault.c, various)
Changed the way that upages and page tables are held. (vm_glue.c)
Fixed architectural flaw in allocating pages at interrupt time that was
introduced with the merged cache changes. (vm_page.c, various)
Adjusted some algorithms to acheive better paging performance and to
accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c)
Fixed pbuf handling problem, changed policy on handling read-behind page.
(vnode_pager.c)

Submitted by: John Dyson


# 5464 10-Jan-1995 dg

Fixed some formatting weirdness that I overlooked in the previous commit.


# 5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 5202 23-Dec-1994 dg

Initialize b_vnbuf.le_next before returning a new buffer in getpbuf and
trypbuf. Move a couple of splbio's to be slightly less conservative.


# 5186 22-Dec-1994 dg

Fixed a benign off by one error.


# 5166 18-Dec-1994 dg

Don't ever clear B_BUSY on a pbuf (or any other flag for that matter).
This appears to be the cause of some buffer confusion that leads to
a panic during heavy paging.

Submitted by: John Dyson


# 4440 13-Nov-1994 dg

Fixed bugs in accounting of swap space that resulted in the pager thinking
it was out of space when it really wasn't.

Submitted by: John Dyson


# 4207 06-Nov-1994 dg

Fixed return status from pagers. Ahem...the previous method would manufacture
data when it couldn't get it legitimately. :-(

Submitted by: John Dyson


# 3841 25-Oct-1994 dg

Improved I/O error reporting.


# 3766 22-Oct-1994 dg

Various changes to allow operation without any swapspace configured. Note
that this is intended for use only in floppy situations and is done at
the sacrifice of performance in that case (in ther words, this is not the
best solution, but works okay for this exceptional situation).

Submitted by: John Dyson


# 3612 15-Oct-1994 dg

1) Some of the counters in the vmmeter struct don't fit well into the Mach VM
scheme of things, so I've changed them to be more appropriate. page in/ous
are now associated with the pager that did them. Nuked v_fault as the
only fault of interest that wouldn't be already counted in v_trap is a VM
fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
some that were done wrong. They are all almost correct now...just a few
minor ones left to fix.


# 3591 14-Oct-1994 dg

Got rid of redundant declaration warnings.


# 3573 13-Oct-1994 dg

Fixed bug where page modifications would be lost when swap space was
almost depleted.

Reviewed by: John Dyson


# 3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


# 3449 08-Oct-1994 phk

Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc.
Reviewed by: davidg


# 3083 25-Sep-1994 dg

Disabled swap anti-fragmentation code. It reduces swap paging performance
by 20% in my tests, and it appears to be the cause of a swap leak.

Submitted by: John Dyson


# 2386 29-Aug-1994 dg

Patches from John Dyson to improve swap code efficiency.
Religiously add back pmap_clear_modify() in vnode_pager_input until we figure
out why system performance isn't what we expect.

Submitted by: John Dyson (swap_pager) & David Greenman (vnode_pager)


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 1895 07-Aug-1994 dg

Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by: John Dyson


# 1887 06-Aug-1994 dg

Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by: John Dyson


# 1817 02-Aug-1994 dg

Added $Id$


# 1810 01-Aug-1994 dg

Removed all code related to the pagescan daemon, and changed 'act_count'
adjustments to compensate for a world without the pagescan daemon.


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources