History log of /freebsd-9.3-release/sys/vm/vm_map.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 258870 03-Dec-2013 jhb

MFC 253471,253620,254430,254538:
Change mmap() to more optimally use superpages and provide support for
tweaking alignment of virtual mappings.
- Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for
vm_map_find() that will try to alter the alignment of a mapping to match
any existing superpage mappings of the object being mapped. If no
suitable address range is found with the necessary alignment,
vm_map_find() will fall back to using the simple first-fit strategy
(VMFS_ANY_SPACE).
- Change mmap() without MAP_FIXED, shmat(), shm_map(), and the GEM mapping
ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.
- MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n).
Requests for n >= number of bits in a pointer or less than the size of
a page fail with EINVAL. This matches the API provided by NetBSD.
- MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used
to optimize the chances of using large pages. By default it will align
the mapping on a large page boundary (the system is free to choose any
large page size to align to that seems best for the mapping request).
However, if the object being mapped is already using large pages, then
it will align the virtual mapping to match the existing large pages in
the object instead.
- Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and
VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment.
MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while
MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE.
- mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than
explicitly using VMFS_SUPER_SPACE. All device objects are forced to
use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively
equivalent.

PR: ports/184173 (exp-run)


# 254089 08-Aug-2013 kib

MFC r253190:
Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct
vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the
entries which transition owner is the current thread.


# 245787 22-Jan-2013 zont

MFC r240145:
- Simplify VM code by using vmspace_wired_count() for counting wired
memory of a process.

MFC r245255:
- Reduce kernel size by removing unnecessary pointer indirections.

GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes.

MFC r245296:
- Improve readability of sys_obreak().

MFC r245421:
- Get rid of unused function vmspace_wired_count().


# 239565 22-Aug-2012 mdf

MFC r238502:

Fix a bug with memguard(9) on 32-bit architectures without a
VM_KMEM_MAX_SIZE.

The code was not taking into account the size of the kernel_map, which
the kmem_map is allocated from, so it could produce a sub-map size too
large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely
and base the memguard map's size off the kernel_map's size, since this
is always relevant and always smaller.

Found by: Justin Hibbits


# 235876 24-May-2012 alc

MFC r235230
Give vm_fault()'s sequential access optimization a makeover.

There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again. This
revision changes both aspects.

The read ahead optimization is now more effective. It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults. This can yield increased read bandwidth. For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.

The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed. This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.

The unmap and cache behind optimization now clears each page's referenced
flag. Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.

From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.


# 233001 15-Mar-2012 kib

MFC r232071:
Account the writeable shared mappings backed by file in the vnode
v_writecount.

MFC r232103:
Place the if() at the right location.

MFC note: the added struct vm_object un_pager.vnp.writemappings member
is located after the fields of struct vm_object that could be accessed
from the modules.


# 231889 17-Feb-2012 kib

MFC r231526:
Close a race due to dropping of the map lock between creating map entry
for a shared mapping and marking the entry for inheritance.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 219124 01-Mar-2011 brucec

Change the return type of vmspace_swap_count to a long to match the other
vmspace_*_count functions.

MFC after: 3 days


# 218966 23-Feb-2011 brucec

Calculate and return the count in vmspace_swap_count as a vm_offset_t
instead of an int to avoid overflow.

While here, clean up some style(9) issues.

PR: kern/152200
Reviewed by: kib
MFC after: 2 weeks


# 216604 20-Dec-2010 alc

Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().

In collaboration with: kib@
MFC after: 6 weeks


# 216335 09-Dec-2010 mlaier

Fix a long standing (from the original 4.4BSD lite sources) race between
vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page
missing" panics. While faulting in pages for a map entry that is being
wired down, mark the containing map as busy. In vmspace_fork wait until the
map is unbusy, before we try to copy the entries.

Reviewed by: kib
MFC after: 5 days
Sponsored by: Isilon Systems, Inc.


# 216128 02-Dec-2010 trasz

Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.

Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation


# 214144 21-Oct-2010 jhb

- Make 'vm_refcnt' volatile so that compilers won't be tempted to treat
its value as a loop invariant. Currently this is a no-op because
'atomic_cmpset_int()' clobbers all memory on current architectures.
- Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop
a reference in vmspace_free().

Reviewed by: alc
MFC after: 1 month


# 212868 19-Sep-2010 alc

Make refinements to r212824. In particular, don't make
vm_map_unlock_nodefer() part of the synchronization interface for maps.

Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing
how they should be used. In particular, describe the deferred deallocations
issue with vm_map_unlock_and_wait().

Redo the implementation of vm_map_unlock_and_wait() so that it passes
along the caller's file and line information, just like the other map
locking primitives.

Reviewed by: kib
X-MFC after: r212824


# 212824 18-Sep-2010 kib

Adopt the deferring of object deallocation for the deleted map entries
on map unlock to the lock downgrade and later read unlock operation.

System map entries cannot be backed by OBJT_VNODE objects, no need to
defer deallocation for them. Map entries from user maps do not require
the owner map for deallocation, and can be accumulated in the
thread-local list for freeing when a user map is unlocked.

Move the collection of entries for deferred reclamation into
vm_map_delete(). Create helper vm_map_process_deferred(), that is
called from locations where processing is feasible. Do not process
deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is
held.

Reviewed by: alc, rstone (previous versions)
Tested by: pho
MFC after: 2 weeks


# 206819 18-Apr-2010 jmallett

o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the
address space for an address as aligned by the new pmap_align_tlb()
function, which is for constraints imposed by the TLB. [1]
o) Add a kmem_alloc_nofault_space() function, which acts like
kmem_alloc_nofault() but allows the caller to specify which find-space
option to use. [1]
o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the
kernel stack address on MIPS. [1]
o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on
an odd boundary within the TLB, so that they are suitable for insertion as
wired entries and do not have to share a TLB entry with another mapping,
assuming they are appropriately-sized.
o) Eliminate md_realstack now that the kstack will be appropriately-aligned on
MIPS.
o) Increase the number of guard pages to 2 so that we retain the proper
alignment of the kstack address.

Reviewed by: [1] alc
X-MFC-after: Making sure alc has not come up with a better interface.


# 206142 03-Apr-2010 alc

Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by: kib


# 199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


# 199490 18-Nov-2009 alc

Simplify both the invocation and the implementation of vm_fault() for wiring
pages.

(Note: Claims made in the comments about the handling of breakpoints in
wired pages have been false for roughly a decade. This and another bug
involving breakpoints will be fixed in coming changes.)

Reviewed by: kib


# 194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 190886 10-Apr-2009 kib

When vm_map_wire(9) is allowed to skip holes in the wired region, skip
the mappings without any of read and execution rights, in particular,
the PROT_NONE entries. This makes mlockall(2) work for the process
address space that has such mappings.

Since protection mode of the entry may change between setting
MAP_ENTRY_IN_TRANSITION and final pass over the region that records
the wire status of the entries, allocate new map entry flag
MAP_ENTRY_WIRE_SKIPPED to mark the skipped PROT_NONE entries.

Reported and tested by: Hans Ottevanger <fbsdhackers beasties demon nl>
Reviewed by: alc
MFC after: 3 weeks


# 189015 24-Feb-2009 kib

Revert the addition of the freelist argument for the vm_map_delete()
function, done in r188334. Instead, collect the entries that shall be
freed, in the deferred_freelist member of the map. Automatically purge
the deferred freelist when map is unlocked.

Tested by: pho
Reviewed by: alc


# 188334 08-Feb-2009 kib

Do not call vm_object_deallocate() from vm_map_delete(), because we
hold the map lock there, and might need the vnode lock for OBJT_VNODE
objects. Postpone object deallocation until caller of vm_map_delete()
drops the map lock. Link the map entries to be freed into the freelist,
that is released by the new helper function vm_map_entry_free_freelist().

Reviewed by: tegge, alc
Tested by: pho


# 186665 31-Dec-2008 alc

Resurrect shared map locks allowing greater concurrency during some map
operations, such as page faults.

An earlier version of this change was ...

Reviewed by: kib
Tested by: pho
MFC after: 6 weeks


# 186633 31-Dec-2008 alc

Update or eliminate some stale comments.


# 178928 10-May-2008 alc

Generalize vm_map_find(9)'s parameter "find_space". Specifically, add
support for VMFS_ALIGNED_SPACE, which requests the allocation of an
address range best suited to superpages. The old options TRUE and FALSE
are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no
immediate need to update all of vm_map_find(9)'s callers.

While I'm here, correct a misstatement about vm_map_find(9)'s return
values in the man page.


# 178630 28-Apr-2008 alc

vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be
passed by value.


# 176717 01-Mar-2008 marcel

Make the vm_pmap field of struct vmspace the last field in the
structure. This allows per-CPU variations of struct pmap on a
single architecture without affecting the machine-independent
fields. As such, the PMAP variations don't affect the ABI. They
become part of it.


# 173429 07-Nov-2007 pjd

Change unused 'user_wait' argument to 'timo' argument, which will be
used to specify timeout for msleep(9).

Discussed with: alc
Reviewed by: alc


# 171902 20-Aug-2007 kib

Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.

Dropping the lock allowed for parallel thread to occupy the freed space.

Reported by: Tijl Coosemans <tijl ulyssis org>
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 159054 29-May-2006 tegge

Close race between vmspace_exitfree() and exit1() and races between
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.

Factor out part of exit1() into new function vmspace_exit(). Attach
to vmspace0 to allow old vmspace to be freed earlier.

Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process. Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.

Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().

Reviewed by: alc


# 153068 03-Dec-2005 alc

Eliminate unneeded preallocation at initialization.

Reviewed by: tegge


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 133636 13-Aug-2004 alc

Replace the linear search in vm_map_findspace() with an O(log n)
algorithm built into the map entry splay tree. This replaces the
first_free hint in struct vm_map with two fields in vm_map_entry:
adj_free, the amount of free space following a map entry, and
max_free, the maximum amount of free space in the entry's subtree.
These fields make it possible to find a first-fit free region of a
given size in one pass down the tree, so O(log n) amortized using
splay trees.

This significantly reduces the overhead in vm_map_findspace() for
applications that mmap() many hundreds or thousands of regions, and
has a negligible slowdown (0.1%) on buildworld. See, for example, the
discussion of a micro-benchmark titled "Some mmap observations
compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.

OpenBSD adopted this approach in March 2002, and NetBSD added it in
November 2003, both with Red-Black trees.

Submitted by: Mark W. Krentel


# 133598 12-Aug-2004 tegge

The vm map lock is needed in vm_fault() after the page has been found,
to avoid later changes before pmap_enter() and vm_fault_prefault()
has completed.

Simplify deadlock avoidance by not blocking on vm map relookup.

In collaboration with: alc


# 133401 09-Aug-2004 green

Revamp VM map wiring.

* Allow no-fault wiring/unwiring to succeed for consistency;
however, the wired count remains at zero, so it's a special case.

* Fix issues inside vm_map_wire() and vm_map_unwire() where the
exact state of user wiring (one or zero) and system wiring
(zero or more) could be confused; for example, system unwiring
could succeed in removing a user wire, instead of being an
error.

* Require all mappings to be unwired before they are deleted.
When VM space is still wired upon deletion, it will be waited
upon for the following unwire. This makes vslock(9) work
rather than allowing kernel-locked memory to be deleted
out from underneath of its consumer as it would before.


# 132880 30-Jul-2004 mux

Get rid of another lockmgr(9) consumer by using sx locks for the user
maps. We always acquire the sx lock exclusively here, but we can't
use a mutex because we want to be able to sleep while holding the
lock. This is completely equivalent to what we were doing with the
lockmgr(9) locks before.

Approved by: alc


# 132593 24-Jul-2004 alc

Simplify vmspace initialization. The bcopy() of fields from the old
vmspace to the new vmspace in vmspace_exec() is mostly wasted effort. With
one exception, vm_swrss, the copied fields are immediately overwritten.
Instead, initialize these fields to zero in vmspace_alloc(), eliminating a
bcopy() from vmspace_exec() and a bzero() from vmspace_fork().


# 131719 06-Jul-2004 alc

Micro-optimize vmspace for 64-bit architectures: Colocate vm_refcnt and
vm_exitingcnt so that alignment does not result in wasted space.


# 131152 26-Jun-2004 alc

Remove an unused field from the vmspace structure.


# 128596 24-Apr-2004 alc

In cases where a file was resident in memory mmap(..., PROT_NONE, ...)
would actually map the file with read access enabled. According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.

The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().

Submitted by: "Mark W. Krentel" <krentel@dreamscape.com>
PR: kern/64573


# 127961 06-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 126865 11-Mar-2004 peter

Part 2 of rev 1.68. Update comment to match reality now that vm_endcopy
exists and we no longer copy to the end of the struct.

Forgotten by: alfred and green


# 122349 09-Nov-2003 alc

- Rename vm_map_clean() to vm_map_sync(). This better reflects the fact
that msync(2) is its only caller.
- Migrate the parts of the old vm_map_clean() that examined the internals
of a vm object to a new function vm_object_sync() that is implemented in
vm_object.c. At the same, introduce the necessary vm object locking so
that vm_map_sync() and vm_object_sync() can be called without Giant.

Reviewed by: tegge


# 121962 03-Nov-2003 des

Whitespace cleanup.


# 120831 05-Oct-2003 bms

Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by: truckman
Discussed with: peter


# 120531 27-Sep-2003 marcel

Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64


# 119595 30-Aug-2003 marcel

Introduce MAP_ENTRY_GROWS_DOWN and MAP_ENTRY_GROWS_UP to allow for
growable (stack) entries that not only grow down, but also grow up.
Have vm_map_growstack() take these flags into account when growing
an entry.

This is the first step in adding support for upward growable stacks.
It is a required feature on ia64 to support the register stack (or
rstack as I like to call it -- it also means reverse stack). We do
not currently create rstacks, so the upward growing is not exercised
and the change should be a functional no-op.

Reviewed by: alc


# 118852 13-Aug-2003 alc

Reduce the size of the vm map (and by inclusion the vm space) on 64-bit
architectures by moving a field within the structure.


# 118771 11-Aug-2003 bms

Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks


# 117047 29-Jun-2003 alc

Introduce vm_map_pmap_enter(). Presently, this is a stub calling the MD
pmap_object_init_pt().


# 116387 15-Jun-2003 alc

Remove an unnecessary forward declaration.


# 112167 12-Mar-2003 das

- When the VM daemon is out of swap space and looking for a
process to kill, don't block on a map lock while holding the
process lock. Instead, skip processes whose map locks are held
and find something else to kill.
- Add vm_map_trylock_read() to support the above.

Reviewed by: alc, mike (mentor)


# 111937 06-Mar-2003 alc

Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.

Discussed on: arch@


# 108518 31-Dec-2002 alc

Add a needed #include.

Reported by: ia64 tinderbox


# 108515 31-Dec-2002 alc

Implement a variant locking scheme for vm maps: Access to system maps
is now synchronized by a mutex, whereas access to user maps is still
synchronized by a lockmgr()-based lock. Why? No single type of lock,
including sx locks, meets the requirements of both types of vm map.
Sometimes we sleep while holding the lock on a user map. Thus, a
a mutex isn't appropriate. On the other hand, both lockmgr()-based
and sx locks release Giant when a thread/process blocks during
contention for a lock. This could lead to a race condition in a legacy
driver (that relies on Giant for synchronization) if it attempts to
kmem_malloc() and fails to immediately obtain the lock. Fortunately,
we never sleep while holding a system map lock.


# 107912 15-Dec-2002 dillon

Fix a refcount race with the vmspace structure. In order to prevent
resource starvation we clean-up as much of the vmspace structure as we
can when the last process using it exits. The rest of the structure
is cleaned up when it is reaped. But since exit1() decrements the ref
count it is possible for a double-free to occur if someone else, such as
the process swapout code, references and then dereferences the structure.
Additionally, the final cleanup of the structure should not occur until
the last process referencing it is reaped.

This commit solves the problem by introducing a secondary reference count,
calling 'vm_exitingcnt'. The normal reference count is decremented on exit
and vm_exitingcnt is incremented. vm_exitingcnt is decremented when the
process is reaped. When both vm_exitingcnt and vm_refcnt are 0, the
structure is freed for real.

MFC after: 3 weeks


# 103777 22-Sep-2002 alc

o Update some comments.


# 100512 22-Jul-2002 alfred

Change struct vmspace->vm_shm from void * to struct shmmap_state *, this
removes the need for casts in several cases.


# 100511 22-Jul-2002 alfred

Remove caddr_t.


# 99754 11-Jul-2002 alc

o Add a "needs wakeup" flag to the vm_map for use by kmem_alloc_wait()
and kmem_free_wakeup(). Previously, kmem_free_wakeup() always
called wakeup(). In general, no one was sleeping.
o Export vm_map_unlock_and_wait() and vm_map_wakeup() from vm_map.c
for use in vm_kern.c.


# 98818 25-Jun-2002 alc

o Eliminate vmspace::vm_minsaddr. It's initialized but never used.
o Replace stale comments in vmspace by "const until freed" annotations
on some fields.


# 98226 14-Jun-2002 alc

o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
o Remove vm_map_pageable() and vm_map_user_pageable().
o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were
only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by: tegge


# 98036 08-Jun-2002 alc

o Remove an unnecessary call to vm_map_wakeup() from vm_map_unwire().
o Add a stub for vm_map_wire().

Note: the description of the previous commit had an error. The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().


# 98022 07-Jun-2002 alc

o Add vm_map_unwire() for unwiring contiguous regions of either kernel
or user vm_maps. In accordance with the standards for munlock(2),
and in contrast to vm_map_user_pageable(), this implementation does not
allow holes in the specified region. This implementation uses the
"in transition" flag described below.
o Introduce a new flag, "in transition," to the vm_map_entry.
Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
this flag by deallocating in-transition vm_map_entrys, allowing
the vm_map lock to be safely released in vm_map_unwire() and (the
forthcoming) vm_map_wire().
o Modify vm_map_simplify_entry() to respect the in-transition flag.

In collaboration with: tegge


# 97727 01-Jun-2002 alc

o Remove GIANT_REQUIRED from vm_map_zfini(), vm_map_zinit(),
vm_map_create(), and vm_map_submap().
o Make further use of a local variable in vm_map_entry_splay()
that caches a reference to one of a vm_map_entry's children.
(This reduces code size somewhat.)
o Revert a part of revision 1.66, deinlining vmspace_pmap().
(This function is MPSAFE.)


# 97710 01-Jun-2002 alc

o Revert a part of revision 1.66, contrary to what that commit message says,
deinlining vm_map_entry_behavior() and vm_map_entry_set_behavior()
actually increases the kernel's size.
o Make vm_map_entry_set_behavior() static and add a comment describing
its purpose.
o Remove an unnecessary initialization statement from vm_map_entry_splay().


# 97198 23-May-2002 alc

o Replace the vm_map's hint by the root of a splay tree. By design,
the last accessed datum is moved to the root of the splay tree.
Therefore, on lookups in which the hint resulted in O(1) access,
the splay tree still achieves O(1) access. In contrast, on lookups
in which the hint failed miserably, the splay tree achieves amortized
logarithmic complexity, resulting in dramatic improvements on vm_maps
with a large number of entries. For example, the execution time
for replaying an access log from www.cs.rice.edu against the thttpd
web server was reduced by 23.5% due to the large number of files
simultaneously mmap()ed by this server. (The machine in question has
enough memory to cache most of this workload.)

Nothing comes for free: At present, I see a 0.2% slowdown on "buildworld"
due to the overhead of maintaining the splay tree. I believe that
some or all of this can be eliminated through optimizations
to the code.

Developed in collaboration with: Juan E Navarro <jnavarro@cs.rice.edu>
Reviewed by: jeff


# 96096 06-May-2002 alc

o Header files shouldn't depend on options: Provide prototypes
for uiomoveco(), uioread(), and vm_uiomove() regardless
of whether ENABLE_VFS_IOOPT is defined or not.

Submitted by: bde


# 96087 05-May-2002 alc

o Move vm_freeze_copyopts() from vm_map.{c.h} to vm_object.{c,h}. It's plainly
an operation on a vm_object and belongs in the latter place.


# 96080 05-May-2002 alc

o Condition the compilation of uiomoveco() and vm_uiomove()
on ENABLE_VFS_IOOPT.
o Add a comment to the effect that this code is experimental
support for zero-copy I/O.


# 95901 02-May-2002 alc

o Remove dead and lockmgr()-specific debugging code.


# 95686 28-Apr-2002 alc

Pass the caller's file name and line number to the vm_map locking functions.


# 95610 28-Apr-2002 alc

o Introduce and use vm_map_trylock() to replace several direct uses
of lockmgr().
o Add missing synchronization to vmspace_swap_count(): Obtain a read lock
on the vm_map before traversing it.


# 95589 27-Apr-2002 alc

o Begin documenting the (existing) locking protocol on the vm_map
in the same style as sys/proc.h.
o Undo the de-inlining of several trivial, MPSAFE methods on the vm_map.
(Contrary to the commit message for vm_map.h revision 1.66 and vm_map.c
revision 1.206, de-inlining these methods increased the kernel's size.)


# 94912 17-Apr-2002 alc

Remove an unused option, VM_FAULT_HOLD, to vm_fault().


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 92588 18-Mar-2002 green

Back out the modification of vm_map locks from lockmgr to sx locks. The
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.

Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.


# 92246 13-Mar-2002 green

Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system. To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe. One more
user of lockmgr down, many more to go :)


# 92029 10-Mar-2002 eivind

- Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by: alc


# 90263 05-Feb-2002 alfred

Fix a race with free'ing vmspaces at process exit when vmspaces are
shared.

Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.

The race occured because of how the reference was utilized:
test vmspace reference,
possibly block,
decrement reference

When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.

Submitted by: green


# 85762 31-Oct-2001 dillon

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 79248 04-Jul-2001 dillon

Change inlines back into mainline code in preparation for mutexing. Also,
most of these inlines had been bloated in -current far beyond their
original intent. Normalize prototypes and function declarations to be ANSI
only (half already were). And do some general cleanup.

(kernel size also reduced by 50-100K, but that isn't the prime intent)


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 77948 09-Jun-2001 dillon

Two fixes to the out-of-swap process termination code. First, start killing
processes a little earlier to avoid a deadlock. Second, when calculating
the 'largest process' do not just count RSS. Instead count the RSS + SWAP
used by the process. Without this the code tended to kill small
inconsequential processes like, oh, sshd, rather then one of the many
'eatmem 200MB' I run on a whim :-). This fix has been extensively tested on
-stable and somewhat tested on -current and will be MFCd in a few days.

Shamed into fixing this by: ps


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76244 03-May-2001 markm

Putting sys/lockmgr.h in here allows us to depollute userland includes
a bit.
OK'ed by: bde


# 75644 18-Apr-2001 alfred

Fix the botched rev 1.59 where I made it such that without INVARIANTS
the map is never locked.

Submitted by: tegge


# 75473 13-Apr-2001 alfred

use %p for pointer printf, include sys/systm.h for printf proto


# 75462 13-Apr-2001 alfred

Use a macro wrapper over printf along with KASSERT to reduce the amount
of code here.


# 74237 14-Mar-2001 dillon

Fix a lock reversal problem in the VM subsystem related to threaded
programs. There is a case during a fork() which can cause a deadlock.

From Tor -
The workaround that consists of setting a flag in the vm map that
indicates that a fork is in progress and using that mark in the page
fault handling to force a revalidation failure. That change will only
affect (pessimize) page fault handling during fork for threaded
(linuxthreads style) applications and applications using aio_*().

Submited by: tegge


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 67046 12-Oct-2000 jasone

For lockmgr mutex protection, use an array of mutexes that are allocated
and initialized during boot. This avoids bloating sizeof(struct lock).
As a side effect, it is no longer necessary to enforce the assumtion that
lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been
removed.

Idea taken from: BSD/OS.


# 66615 03-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


# 57550 28-Feb-2000 ps

Add MAP_NOCORE to mmap(2), and MADV_NOCORE and MADV_CORE to madvise(2).
This
This feature allows you to specify if mmap'd data is included in
an application's corefile.

Change the type of eflags in struct vm_map_entry from u_char to
vm_eflags_t (an unsigned int).

Reviewed by: dillon,jdp,alfred
Approved by: jkh


# 55206 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 54467 12-Dec-1999 dillon

Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

This feature prevents the update daemon from gratuitously flushing
dirty pages associated with a mapped file-backed region of memory. The
system pager will still page the memory as necessary and the VM system
will still be fully coherent with the filesystem. Modifications made
by other means to the same area of memory, for example by write(), are
unaffected. The feature works on a page-granularity basis.

MAP_NOSYNC allows one to use mmap() to share memory between processes
without incuring any significant filesystem overhead, putting it in
the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg


# 51493 21-Sep-1999 dillon

cleanup madvise code, add a few more sanity checks.

Reviewed by: Alan Cox <alc@cs.rice.edu>, dg@root.com


# 51342 17-Sep-1999 dillon

Add 'lastr' field to vm_map_entry in preparation for its removal
from the vnode. (The changeover is undergoing final testing and
will be committed soon).

Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50248 23-Aug-1999 alc

Correct the inconsistent formatting in struct vm_map.

Addendum to rev 1.47: submitted by dillon.


# 50247 23-Aug-1999 alc

struct vm_map:
The lock structure cannot be the first element of the vm_map
because this can result in livelock between two or more system
processes trying to kmem_alloc_wait.


# 49998 18-Aug-1999 mjacob

Fix breakage - an extra brace got inserted where DIAGNOSTIC was defined
but MAP_LOCK_DIAGNOSTIC wasn't.


# 49900 16-Aug-1999 alc

vm_map_lock*:
Remove semicolons or add "do { } while (0)" as necessary
to enable the use of these macros in arbitrary statements.
(There are no functional changes.)

Submitted by: dillon


# 49338 01-Aug-1999 alc

Move the memory access behavior information provided by madvise
from the vm_object to the vm_map.

Submitted by: dillon


# 48736 10-Jul-1999 alc

Remove unused function prototypes.


# 48022 19-Jun-1999 alc

Remove some unused function and variable declarations.


# 47258 16-May-1999 alc

Add the options MAP_PREFAULT and MAP_PREFAULT_PARTIAL to vm_map_find/insert,
eliminating the need for the pmap_object_init_pt calls in imgact_* and
mmap.

Reviewed by: David Greenman <dg@root.com>


# 47243 16-May-1999 alc

Remove prototypes for functions that don't exist anymore (vm_map.h).

Remove a useless argument from vm_map_madvise's interface (vm_map.c,
vm_map.h, and vm_mmap.c).

Remove a redundant test in vm_uiomove (vm_map.c).

Make two changes to vm_object_coalesce:

1. Determine whether the new range of pages actually overlaps
the existing object's range of pages before calling vm_object_page_remove.
(Prior to this change almost 90% of the calls to vm_object_page_remove
were to remove pages that were beyond the end of the object.)

2. Free any swap space allocated to removed pages.


# 47207 14-May-1999 alc

Simplify vm_map_find/insert's interface: remove the MAP_COPY_NEEDED option.

It never makes sense to specify MAP_COPY_NEEDED without also specifying
MAP_COPY_ON_WRITE, and vice versa. Thus, MAP_COPY_ON_WRITE suffices.

Reviewed by: David Greenman <dg@root.com>


# 44513 06-Mar-1999 alc

Upgrading a map's lock to exclusive status should increment
the map's timestamp. In general, whenever an exclusive lock is
acquired the timestamp should be incremented.


# 44396 02-Mar-1999 alc

Remove the last of the share map code: struct vm_map::is_main_map.

Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>


# 44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


# 43748 07-Feb-1999 dillon

Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.


# 43209 26-Jan-1999 julian

Mostly remove the VM_STACK OPTION.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.

As the author says:

|I ran into a problem pulling out the VM_STACK option. I was aware of this
|when I first did the work, but then forgot about it. The VM_STACK stuff
|has some code changes in the i386 branch. There need to be corresponding
|changes in the alpha branch before it can come out completely.

what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in. This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches. The vm_map_entry will then always have one
|extra element (avail_ssize). It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c. This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch. I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 42360 06-Jan-1999 julian

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


# 32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


# 32286 06-Jan-1998 dyson

Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.


# 31853 19-Dec-1997 dyson

Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)


# 28345 18-Aug-1997 dyson

Fix kern_lock so that it will work. Additionally, clean-up some of the
VM systems usage of the kernel lock (lockmgr) code. This is a first
pass implementation, and is expected to evolve as needed. The API
for the lock manager code has not changed, but the underlying implementation
has changed significantly. This change should not materially affect
our current SMP or UP code without non-standard parameters being used.


# 27899 04-Aug-1997 dyson

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


# 24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22878 18-Feb-1997 bde

Removed vestiges of Mach lock types.

vm_map.h:
Removed #include of <sys/proc.h>. curproc is only used in some macros
and users of the macros already include <sys/proc.h>.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21754 16-Jan-1997 dyson

Change the map entry flags from bitfields to bitmasks. Allows
for some code simplification.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 20993 28-Dec-1996 dyson

Eliminate the redundancy due to the similarity between the routines
vm_map_simplify and vm_map_simplify_entry. Make vm_map_simplify_entry
handle wired maps so that we can get rid of vm_map_simplify. Modify
the callers of vm_map_simplify to properly use vm_map_simplify_entry.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 20449 14-Dec-1996 dyson

Implement closer-to POSIX mlock semantics. The major difference is
that we do allow mlock to span unallocated regions (of course, not
mlocking them.) We also allow mlocking of RO regions (which the old
code couldn't.) The restriction there is that once a RO region is
wired (mlocked), it cannot be debugged (or EVER written to.)

Under normal usage, the new mlock code will be a significant improvement
over our old stuff.


# 20182 06-Dec-1996 dyson

Make vm_map_insert much more intelligent in the MAP_NOFAULT case so
that map entries are coalesced when appropriate. Also, conditionalize
some code that is currently not used in vm_map_insert. This mod
has been added to eliminate unnecessary map entries in buffer map.

Additionally, there were some cases where map coalescing could be done
when it shouldn't. That problem has been resolved.


# 20054 30-Nov-1996 dyson

Implement a new totally dynamic (up to MAXPHYS) buffer kva allocation
scheme. Additionally, add the capability for checking for unexpected
kernel page faults. The maximum amount of kva space for buffers hasn't
been decreased from where it is, but it will now be possible to do so.

This scheme manages the kva space similar to the buffers themselves. If
there isn't enough kva space because of usage or fragementation, buffers
will be reclaimed until a buffer allocation is successful. This scheme
should be very resistant to fragmentation problems until/if the LFS code
is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS
is not likely to use such a scheme.

Now there should be NO problem allocating buffers up to MAXPHYS.


# 17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


# 17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


# 15819 19-May-1996 dyson

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


# 13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 12820 14-Dec-1995 phk

Another mega commit to staticize things.


# 12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 10344 26-Aug-1995 bde

Change vm_map_print() to have the correct number and type of args for
a ddb command.


# 9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources