History log of /freebsd-9.3-release/sys/i386/i386/pmap.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 265554 07-May-2014 alc

MFC r262338
When the kernel is running in a virtual machine, it cannot rely upon the
processor family to determine if the workaround for AMD Family 10h Erratum
383 should be enabled. To enable virtual machine migration among a
heterogeneous collection of physical machines, the hypervisor may have
been configured to report an older processor family with a reduced feature
set. Effectively, the reported processor family and its features are like
a "least common denominator" for the collection of machines.

Therefore, when the kernel is running in a virtual machine, instead of
relying upon the processor family, we now test for features that prove
that the underlying processor is not affected by the erratum. (The
features that we test for are unlikely to ever be emulated in software
on an affected physical processor.)

PR: 186061


# 256207 09-Oct-2013 mav

MFC r251703 (by attilio):
- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()


# 255811 23-Sep-2013 kib

MFC r255607:
In pmap_copy(), when the copied region is mapped with superpage but does
not cover entire superpage, avoid copying.

MFC r255620:
Merge the change r255607 from amd64 to i386.


# 251897 18-Jun-2013 scottl

Merge the second part of the unmapped I/O changes. This enables the
infrastructure in the block layer and UFS filesystem as well as a few
drivers. The list of MFC revisions is long, so I won't quote changelogs.

r248508,248510,248511,248512,248514,248515,248516,248517,248518,
248519,248520,248521,248550,248568,248789,248790,249032,250936

Submitted by: kib
Approved by: kib
Obtained from: Netflix


# 248814 28-Mar-2013 kib

MFC r248280:
Add pmap function pmap_copy_pages().


# 248085 09-Mar-2013 marius

MFC: r227309 (partial)

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 247547 01-Mar-2013 jhb

MFC 245577,245640:
Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7). The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.


# 242172 27-Oct-2012 kib

MFC r242010:
Add missed sched_pin().


# 241518 13-Oct-2012 alc

MFC r241353, r241356, r241400
To avoid page table page corruption, change pmap_pv_reclaim()'s method
of mapping page table pages.

Add some assertions that were helpful in debugging the page table page
corruption.


# 240754 20-Sep-2012 alc

MFC r240317
Simplify pmap_unmapdev().

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.


# 240151 05-Sep-2012 kib

MFC r233122,r237086,r237228,r237264,r237290,r237404,r237414,r237513,r237551,
r237592,r237604,r237623,r237684,r237733,r237813,r237855,r238124,r238126,
r238163,r238414,r238610,r238889,r238970,r239072,r239137,r240126 (all by alc):

Add fine-grained PV chunk and list locking to the amd64 pmap, enabling
concurrent execution of the following functions on different pmaps:

pmap_change_wiring()
pmap_copy()
pmap_enter()
pmap_enter_object()
pmap_enter_quick()
pmap_page_exists_quick()
pmap_page_is_mapped()
pmap_protect()
pmap_remove()
pmap_remove_pages()

Requested and approved by: alc


# 239996 01-Sep-2012 kib

MFC r238792:
Introduce curpcb magic variable.


# 238005 02-Jul-2012 alc

MFC r236534
Various small changes to PV entry management.


# 237966 02-Jul-2012 alc

MFC r235695, r236158, r236190, r236494
Replace all uses of the vm page queues lock by a r/w lock that is
private to this pmap.c.


# 237952 02-Jul-2012 alc

MFC r235912
There is no need for pmap_protect() to acquire the page queues lock
unless it is going to access the pv lists or PMAP1/PADDR1.


# 237950 02-Jul-2012 alc

MFC r233290, r235598, r235973, r236045, r236240, r236291, r236378
Change pv_entry_count to a long on amd64.

Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it
no longer uses the active and inactive paging queues.

Eliminate some purely stylistic differences among the amd64, i386
native, and i386 xen PV entry allocators.

Eliminate code duplication in free_pv_entry() and pmap_remove_pages()
by introducing free_pv_chunk().


# 237949 02-Jul-2012 alc

MFC r226843
Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc(). While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.


# 236123 26-May-2012 alc

MFC r228513
Create large page mappings in pmap_map().


# 235884 24-May-2012 alc

MFC r233433
Disable detailed PV entry accounting by default. Add a config option
to enable it.


# 233775 02-Apr-2012 kib

MFC r233168:
If we ever allow for managed fictitious pages, the pages shall be
excluded from superpage promotions. At least one of the reason is
that pv_table is sized for non-fictitious pages only.

Consistently check for the page to be non-fictitious before accesing
superpage pv list.

MFC note: erronous chunks from r233168 which were fixed in r233185
are not included into the merge.


# 230435 21-Jan-2012 alc

MFC r228923, r228935, and r229007
Eliminate many of the unnecessary differences between the native and
paravirtualized pmap implementations for i386.

Fix a bug in the Xen pmap's implementation of
pmap_extract_and_hold(): If the page lock acquisition is retried,
then the underlying thread is not unpinned.

Wrap nearby lines that exceed 80 columns.

Merge r216333 and r216555 from the native pmap
When r207410 eliminated the acquisition and release of the page
queues lock from pmap_extract_and_hold(), it didn't take into
account that pmap_pte_quick() sometimes requires the page queues
lock to be held. This change reimplements pmap_extract_and_hold()
such that it no longer uses pmap_pte_quick(), and thus never
requires the page queues lock.

Merge r177525 from the native pmap
Prevent the overflow in the calculation of the next page
directory. The overflow causes the wraparound with consequent
corruption of the (almost) whole address space mapping.

Strictly speaking, r177525 is not required by the Xen pmap because
the hypervisor steals the uppermost region of the normal kernel
address space. I am nonetheless merging it in order to reduce the
number of unnecessary differences between the native and Xen pmap
implementations.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# 224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


# 223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


# 223732 02-Jul-2011 alc

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


# 223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


# 222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


# 220803 18-Apr-2011 kib

Make pmap_invalidate_cache_range() available for consumption on amd64.

Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


# 216555 19-Dec-2010 alc

Redo some parts of r216333, specifically, the locking changes to
pmap_extract_and_hold(), and undo the rest. In particular, I forgot
that PG_PS and PG_PTE_PAT are the same bit.


# 216516 18-Dec-2010 kib

In pmap_extract(), unlock pmap lock earlier. The calculation does not need
the lock when operating on local variables.

Reviewed by: alc


# 216333 09-Dec-2010 alc

When r207410 eliminated the acquisition and release of the page queues
lock from pmap_extract_and_hold(), it didn't take into account that
pmap_pte_quick() sometimes requires the page queues lock to be held.
This change reimplements pmap_extract_and_hold() such that it no
longer uses pmap_pte_quick(), and thus never requires the page queues
lock.

For consistency, adopt the same idiom as used by the new
implementation of pmap_extract_and_hold() in pmap_extract() and
pmap_mincore(). It also happens to make these functions shorter.

Fix a style error in pmap_pte().

Reviewed by: kib@


# 215754 23-Nov-2010 jkim

Remove a stale tunable introduced in r215703.


# 215703 22-Nov-2010 jkim

- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual. Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-). Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as
WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the
default of WB and UC. This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency. Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks. With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more. Now this function is
identical on both amd64 and i386.

Reviewed by: jhb
Tested by: RM (reuf_m at hotmail dot com)
Ryszard Czekaj (rychoo at freeshell dot net)
army.of.root (army dot of dot root at googlemail dot com)
MFC after: 3 days


# 214938 07-Nov-2010 alc

Eliminate a possible race between pmap_pinit() and pmap_kenter_pde() on
superpage promotion or demotion.

Micro-optimize pmap_kenter_pde().

Reviewed by: kib, jhb (an earlier version)
MFC after: 1 week


# 213455 05-Oct-2010 alc

Initialize KPTmap in locore so that vm86.c can call vtophys() (or really
pmap_kextract()) before pmap_bootstrap() is called.

Document the set of pmap functions that may be called before
pmap_bootstrap() is called.

Tested by: bde@
Reviewed by: kib@
Discussed with: jhb@
MFC after: 6 weeks


# 211424 17-Aug-2010 gahr

- The iMac9,1 needs the PAT workaround as well

Approved by: cognet


# 211149 10-Aug-2010 attilio

Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with: Yahoo! Incorporated (via sbruno and peter)
Tested by: gianni
MFC after: 1 month


# 209887 10-Jul-2010 alc

Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after: 3 weeks


# 209862 09-Jul-2010 kib

For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks


# 209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


# 208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


# 208765 03-Jun-2010 alc

In the unlikely event that pmap_ts_referenced() demoted five superpage
mappings to the same underlying physical page, the calling thread would be
left forever pinned to the same processor.

MFC after: 3 days


# 208667 31-May-2010 alc

Eliminate a stale comment.


# 208657 30-May-2010 alc

Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.


# 208645 29-May-2010 alc

When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().


# 208609 28-May-2010 alc

Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released. This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().


# 208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


# 208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


# 208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


# 207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


# 207702 06-May-2010 alc

Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.


# 207410 29-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 207205 25-Apr-2010 alc

Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation. Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection. This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.


# 207163 24-Apr-2010 kmacy

- fix style issues on i386 as well

requested by: alc@


# 207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


# 205778 27-Mar-2010 alc

Correctly handle preemption of pmap_update_pde_invalidate().

X-MFC after: r205573


# 205773 27-Mar-2010 alc

Simplify pmap_growkernel(), making the i386 version more like the amd64
version.

MFC after: 3 weeks


# 205652 25-Mar-2010 alc

A ptrace(2) by one processor may trigger a promotion in the address space
of another process. Modify pmap_promote_pde() to handle this. (This is
not a problem on amd64 due to implementation differences.)

Reported by: jh@
MFC after: 1 week


# 205573 24-Mar-2010 alc

Adapt r204907 and r205402, the amd64 implementation of the workaround for
AMD Family 10h Erratum 383, to i386.

Enable machine check exceptions by default, just like r204913 for amd64.

Enable superpage promotion only if the processor actually supports large
pages, i.e., PG_PS.

MFC after: 2 weeks


# 205334 19-Mar-2010 avg

pmap amd64/i386: fix a typo in a comment

MFC after: 3 days


# 205063 12-Mar-2010 jhb

Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.


# 204957 10-Mar-2010 kib

Fall back to wbinvd when region for CLFLUSH is >= 2MB.

Submitted by: Kevin Day <toasty dragondata com>
Reviewed by: jhb
MFC after: 2 weeks


# 204420 27-Feb-2010 alc

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


# 204041 18-Feb-2010 ed

Allow the pmap code to be built with GCC from FreeBSD 7 again.

This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by: jhb


# 203351 01-Feb-2010 alc

Change the default value for the flag enabling superpage mapping and
promotion to "on".


# 203085 27-Jan-2010 alc

Optimize pmap_demote_pde() by using the new KPTmap to access a kernel
page table page instead of creating a temporary mapping to it.

Set the PG_G bit on the page table entries that implement the KPTmap.

Locore initializes the unused portions of the NKPT kernel page table
pages that it allocates to zero. So, pmap_bootstrap() needn't zero
the page table entries referenced by CMAP1 and CMAP3.

Simplify pmap_set_pg().

MFC after: 10 days


# 202894 23-Jan-2010 alc

Handle a race between pmap_kextract() and pmap_promote_pde(). This race is
known to cause a kernel crash in ZFS on i386 when superpage promotion is
enabled.

Tested by: netchild
MFC after: 1 week


# 202628 19-Jan-2010 ed

Recommit r193732:

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.


# 202085 11-Jan-2010 alc

Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized. The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after: 1 week


# 201934 09-Jan-2010 alc

Long ago, in r120654, the rounding of KERNend and physfree in locore
was changed from a small page boundary to a large page boundary. As
a consequence pmap_kmem_choose() became a pointless waste of address
space. Eliminate it.


# 201751 07-Jan-2010 alc

Make pmap_set_pg() static.


# 199184 11-Nov-2009 avg

reflect that pg_ps_enabled is a tunable, not just a read-only sysctl

Nod from: jhb


# 198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


# 197317 18-Sep-2009 alc

When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


# 197070 10-Sep-2009 jkim

Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.


# 196771 02-Sep-2009 jkim

Fix confusing comments about default PAT entries.


# 196769 02-Sep-2009 jkim

- Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb


# 196707 31-Aug-2009 jhb

Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with: alc
MFC after: 3 days


# 196705 31-Aug-2009 jhb

Improve pmap_change_attr() so that it is able to demote a large (2/4MB)
page into 4KB pages as needed. This should be fairly rare in practice
on i386. This includes merging the following changes from the amd64 pmap:
180430, 180485, 180845, 181043, 181077, and 196318.
- Add basic support for changing attributes on PDEs to pmap_change_attr()
similar to the support in the initial version of pmap_change_attr() on
amd64 including inlines for pmap_pde_attr() and pmap_pte_attr().
- Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.
- Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2/4MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2/4MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.
- Correct a critical accounting error in pmap_demote_pde().

Reviewed by: alc
MFC after: 3 days


# 196643 29-Aug-2009 rnoland

Swap the start/end virtual addresses in pmap_invalidate_cache_range().

This fixes the functionality on non SelfSnoop hardware.

Found by: rnoland
Submitted by: alc
Reviewed by: kib
MFC after: 3 days


# 195940 29-Jul-2009 kib

As was done in r195820 for amd64, use clflush for flushing cache lines
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.

Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.

Proposed and reviewed by: alc
Approved by: re (kensmith)


# 195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 195836 23-Jul-2009 alc

Eliminate unnecessary cache and TLB flushes by pmap_change_attr(). (This
optimization was implemented in the amd64 version roughly 1 year ago.)

Approved by: re (kensmith)


# 195774 19-Jul-2009 alc

Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by: re (kib)


# 195749 17-Jul-2009 alc

An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by: re (kib)


# 195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


# 195385 05-Jul-2009 alc

PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM. This
requirement is met by defining an UMA zone with a custom back-end
allocator function. This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig(). This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags. (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.

The back-end allocator function in the Xen pmap is dead code. I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.

Approved by: re (kib)


# 194209 14-Jun-2009 alc

Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt(). This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous. Unfortunately, this is not always the case. For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption. Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous. This
revision also changes some inconsistent if not buggy behavior. For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid. However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page. The amd64 version has a bug of
my own creation. It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance. I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms. However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity). These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with: jhb@


# 193734 08-Jun-2009 ed

Revert my change; reintroduce __gnu89_inline.

It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by: rwatson


# 193732 08-Jun-2009 ed

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc


# 192114 14-May-2009 attilio

FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 192035 13-May-2009 alc

Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after: 6 weeks


# 189795 14-Mar-2009 alc

MFamd64 r189785
Update the pmap's resident page count when a page table page is freed in
pmap_remove_pde() and pmap_remove_pages().

MFC after: 6 weeks


# 189055 25-Feb-2009 jkim

Enable support for PAT_WRITE_PROTECTED and PAT_UNCACHED cache modes
unconditionally on amd64. On i386, we assume PAT is usable if the CPU
vendor is not Intel or CPU model is newer than Pentium IV.

Reviewed by: alc, jhb


# 188932 23-Feb-2009 alc

Optimize free_pv_entry(); specifically, avoid repeated TAILQ_REMOVE()s.

MFC after: 1 week


# 188608 14-Feb-2009 alc

Remove unnecessary page queues locking around vm_page_busy() and
vm_page_wakeup(). (This change is applicable to RELENG_7 but not
RELENG_6.)

MFC after: 1 week


# 183207 20-Sep-2008 alc

MFamd64 SVN rev 179749 CVS rev 1.620
Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page. The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that
hasn't been modified (PG_M). In general, if there are two or more such
PTEs to choose among, it is better to write protect the one nearer the
high end of the page table page rather than the low end. This is because
most programs access memory in an ascending direction. The net result of
this change is a sometimes significant reduction in the number of failed
promotion attempts and the number of pages that are write protected by
pmap_promote_pde().

MFamd64 SVN rev 179777 CVS rev 1.621
Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M. This sometimes prevents unnecessary removal of write access
from a PTE. Overall, the net result is fewer demotions and promotion
failures.


# 183169 19-Sep-2008 alc

MFamd64 SVN rev 179471 CVS rev 1.619
Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space.


# 181284 04-Aug-2008 alc

Make pmap_kenter_attr() static.


# 180873 28-Jul-2008 alc

Correct an off-by-one error in the previous change to pmap_change_attr().
Change the nearby comment to mention the recursive map.


# 180871 28-Jul-2008 alc

Don't allow pmap_change_attr() to be applied to the recursive mapping.


# 180846 27-Jul-2008 alc

Style fixes to several function definitions.


# 180601 18-Jul-2008 alc

Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped. Previously, the loop was testing the
same address every time.

Submitted by: Magesh Dhasayyan


# 180600 18-Jul-2008 alc

Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().


# 180352 07-Jul-2008 alc

In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page. Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after: 1 week


# 179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


# 178947 11-May-2008 alc

Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.


# 178875 09-May-2008 alc

Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.


# 178493 25-Apr-2008 alc

Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.


# 178070 10-Apr-2008 alc

Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().


# 177967 07-Apr-2008 alc

Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.


# 177921 04-Apr-2008 alc

Reintroduce UMA_SLAB_KMAP; however, change its spelling to
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.

Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks


# 177702 29-Mar-2008 alc

Eliminate an #if 0/#endif that was unintentionally introduced
by the previous revision.


# 177684 28-Mar-2008 brooks

Use ; instead of : to end a line.

Submitted by: Niclas Zeising <niclas dot zeising at gmail dot com>


# 177680 28-Mar-2008 ps

Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by: alc, ups


# 177659 27-Mar-2008 alc

MFamd64 with few changes:

1. Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. Tested by: kris

2. To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.


# 177525 23-Mar-2008 kib

Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week


# 175404 17-Jan-2008 alc

Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)

Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.

Eliminate \n from a nearby panic string.

Use KASSERT() to reimplement pmap_copy()'s two assertions.


# 175328 14-Jan-2008 peter

Add a CTASSERT that KERNBASE is valid. This is usually messed up by an
invalid KVA_PAGES, so add a pointer to there.


# 175155 08-Jan-2008 alc

Convert a PMAP_DIAGNOSTIC to a KASSERT.


# 175119 06-Jan-2008 alc

Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.


# 175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


# 175056 02-Jan-2008 alc

Provide a legitimate pindex to vm_page_alloc() in pmap_growkernel()
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.

Correct a nearby style error: Pointers should be compared to NULL.


# 174496 09-Dec-2007 alc

Eliminate compilation warnings due to the use of non-static inlines
through the introduction and use of the __gnu89_inline attribute.

Submitted by: bde (i386)
MFC after: 3 days


# 174250 04-Dec-2007 alc

Correct an error under COUNT_IPIS within pmap_lazyfix_action(): Increment
the counter that the pointer refers to, not the pointer.

MFC after: 3 days


# 174104 30-Nov-2007 alc

Improve get_pv_entry()'s handling of low-memory conditions. After page
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.

Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.

MFC after: 6 weeks


# 173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


# 173592 13-Nov-2007 peter

Drastically simplify the i386 pcpu backend by merging parts of the
amd64 mechanism over. Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do. Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up. As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.

Background information: Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address. In order to share page tables, we switched
to using the GDT and %fs/%gs to access it. But we still did the evil
magic to set it up for the old way. The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored. (excercise for reader: free these afterwards).


# 173370 05-Nov-2007 alc

Add comments explaining why all stores updating a non-kernel page table
must be globally performed before calling any of the TLB invalidation
functions.

With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)

Reported and reviewed by: ups
MFC after: 3 days


# 173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 173296 03-Nov-2007 alc

Eliminate spurious "Approaching the limit on PV entries, ..."
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)

Reported by: Jeremy Chadwick

Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.

MFC after: 5 days


# 171907 21-Aug-2007 alc

In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by: re (kensmith)


# 171128 01-Jul-2007 alc

Pages that do belong to an object and page queue can now be freed without
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.

Approved by: re (kensmith)


# 170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 169805 20-May-2007 jeff

- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.

Suggested by: julian@
Contributed by: attilio@


# 169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 169040 25-Apr-2007 ups

Forced commit to add more comment to:

1.583 src/sys/amd64/amd64/pmap.c
1.588 src/sys/i386/i386/pmap.c

Invalidate all TLBs and page structure caches that may reference a page
for page walk acceleration before releasing it to the free list.

This is required for CPUs implementing page structure caches as described
in the Intel application note:
"TLBs, Paging-Structure Caches, and Their Invalidation"
(Document Number: 317080-001)

Reviewed by: Siddha Suresh (Intel), alc@, peter@


# 168930 21-Apr-2007 ups

Modify TLB invalidation handling.

Reviewed by: alc@, peter@
MFC after: 1 week


# 168691 13-Apr-2007 alc

Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@


# 168439 06-Apr-2007 ru

Add the PG_NX support for i386/PAE.

Reviewed by: alc


# 167869 24-Mar-2007 alc

In order to satisfy ACPI's need for an identity mapping, modify the
temporary mapping created by locore so that the lowest two to four
megabytes can become a permanent identity mapping. This implementation
avoids any use of a large page mapping.


# 167668 17-Mar-2007 alc

Eliminate an unused parameter.


# 167578 14-Mar-2007 njl

Create an identity mapping (V=P) super page for the low memory region on
boot. Then, just switch to the kernel pmap when suspending instead of
allocating/freeing our own mapping every time. This should solve a panic
of pmap_remove() being called with interrupts disabled. Thanks to Alan
Cox for developing this patch.

Note: this means that ACPI requires super page (PG_PS) support in the CPU.
This has been present since the Pentium and first documented in the
Pentium Pro. However, it may need to be revisited later.

Submitted by: alc
MFC after: 1 month


# 167250 05-Mar-2007 alc

Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks


# 166810 18-Feb-2007 alc

Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.


# 166074 17-Jan-2007 delphij

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# 164413 19-Nov-2006 alc

The global variable avail_end is redundant and only used once. Eliminate
it. Make avail_start static to the pmap on amd64. (It no longer exists
on other architectures.)


# 164332 16-Nov-2006 maxim

o Make pv_maxchunks no less than maxproc. This helps to survive a
forkbomb explosion.

Reviewed by: alc
Security: local DoS
X-MFC atfer: RELENG_6 is not affected due to a different pv_entry
allocation code.


# 164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


# 163603 22-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests.


# 161285 14-Aug-2006 jhb

Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA). We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by: alc


# 161223 11-Aug-2006 jhb

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


# 161016 06-Aug-2006 alc

Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().


# 160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


# 160525 20-Jul-2006 alc

Add pmap_clear_write() to the interface between the virtual memory
system's machine-dependent and machine-independent layers. Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write(). Why? Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures. The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified." However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission. Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission. On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64. This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.

Discussed with: grehan@ and marcel@


# 160461 18-Jul-2006 alc

MFamd64
pmap_clear_ptes() is already convoluted. This will worsen with the
implementation of superpages. Eliminate it and add pmap_clear_write().

There are no functional changes. Checked by: md5


# 160419 17-Jul-2006 alc

Now that free_pv_entry() accesses the pmap, call free_pv_entry() in
pmap_remove_all() before rather than after the pmap is unlocked. At
present, the page queues lock provides sufficient sychronization. In the
future, the page queues lock may not always be held when free_pv_entry() is
called.


# 160412 16-Jul-2006 alc

MFamd64
Make three simplifications to pmap_ts_referenced():
Eliminate an initialized but otherwise unused variable.
Eliminate an unnecessary test.
Exit the loop in a shorter way.


# 160408 16-Jul-2006 alc

Eliminate the remaining uses of "register".

Convert the remaining K&R-style function declarations to ANSI-style.

Eliminate excessive white space from pmap_ts_referenced().


# 160379 15-Jul-2006 alc

Make pc_freemask an array of uint32_t, rather than uint64_t. (I believe
that the use of the latter is simply an oversight in porting the new pv
entry code from amd64.)


# 160073 02-Jul-2006 alc

Correct an error in the new pmap_collect(), thus only affecting HEAD.
Specifically, the pv entry was always being freed to the caller's pmap
instead of the pmap to which the pv entry belongs.


# 159970 27-Jun-2006 alc

Correct a very old and very obscure bug: vmspace_fork() calls
pmap_copy() if the mapping is VM_INHERIT_SHARE. Suppose the mapping
is also wired. vmspace_fork() clears the wiring attributes in the vm
map entry but pmap_copy() copies the PG_W attribute in the PTE. I
don't think this is catastrophic. It blocks pmap_remove_pages() from
destroying the mapping and corrupts the pmap's wiring count.

This revision fixes the problem by changing pmap_copy() to clear the
PG_W attribute.

Reviewed by: tegge@


# 159933 25-Jun-2006 alc

Eliminate a comment that became stale after revision 1.547.


# 159803 20-Jun-2006 alc

Change get_pv_entry() such that the call to vm_page_alloc() specifies
VM_ALLOC_NORMAL instead of VM_ALLOC_SYSTEM when try is TRUE. In other
words, when get_pv_entry() is permitted to fail, it no longer tries as
hard to allocate a page.

Change pmap_enter_quick_locked() to fail rather than wait if it is
unable to allocate a page table page. This prevents a race between
pmap_enter_object() and the page daemon. Specifically, an inactive
page that is a successor to the page that was given to
pmap_enter_quick_locked() might become a cache page while
pmap_enter_quick_locked() waits and later pmap_enter_object() maps
the cache page violating the invariant that cache pages are never
mapped. Similarly, change
pmap_enter_quick_locked() to call pmap_try_insert_pv_entry() rather
than pmap_insert_entry(). Generally speaking,
pmap_enter_quick_locked() is used to create speculative mappings. So,
it should not try hard to allocate memory if free memory is scarce.

Add an assertion that the object containing m_start is locked in
pmap_enter_object(). Remove a similar assertion from
pmap_enter_quick_locked() because that function no longer accesses the
containing object.

Remove a stale comment.

Reviewed by: ups@


# 159627 14-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


# 159546 12-Jun-2006 alc

Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@


# 159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


# 159244 05-Jun-2006 alc

MFamd64
Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().

Reduce the scope of the page queues lock in pmap_remove_pages().


# 158238 01-May-2006 jhb

Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after: 1 month


# 158236 01-May-2006 jhb

Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after: 1 month


# 158235 01-May-2006 peter

Using an idea from Stephan Uphoff, use the empty pte's that correspond
to the unused kva in the pv memory block to thread a freelist through.
This allows us to free pages that used to be used for pv entry chunks
since we can now track holes in the kva memory block.

Idea from: ups


# 158226 01-May-2006 peter

Fix missing changes required for the amd64->i386 conversion. Add the
missing VM_ALLOC_WIRED flags to vm_page_alloc() calls I added.

Submitted by: alc


# 158120 28-Apr-2006 peter

Interim fix for pmap problems I introduced with my last commit.
Remove the code to dyanmically change the pv_entry limits. Go back
to a single fixed kva reservation for pv entries, like was done
before when using the uma zone. Go back to never freeing pages
back to the free pool after they are no longer used, just like
before.

This stops the lock order reversal due to aquiring the kernel map
lock while pmap was locked.

This fixes the recursive panic if invariants are enabled.

The problem was that allocating/freeing kva causes vm_map_entry
nodes to be allocated/freed. That can recurse back into pmap as
new pages are hooked up to kvm and hence all the problem.
Allocating/freeing kva indirectly allocate/frees memory.

So, by going back to a single fixed size kva block and an index,
we avoid the recursion panics and the LOR.

The problem is that now with a linear block of kva, we have no
mechanism to track holes once pages are freed. UMA has the same
problem when using custom object for a zone and a fixed reservation
of kva. Simple solutions like having a bitmap would work, but would
be very inefficient when there are hundreds of thousands of bits
in the map. A first-free pointer is similarly flawed because pages
can be freed at random and the first-free pointer would be rewinding
huge amounts. If we could allocate memory for tree strucures or
an external freelist, that would work. Except we cannot allocate/free
memory here because we cannot allocate/free address space to use
it in. Anyway, my change here reverts back to the UMA behavior of
not freeing pages for now, thereby avoiding holes in the map.

ups@ had a truely evil idea that I'll investigate. It should allow
freeing unused pages again by giving us a no-cost way to track the
holes in the kva block. But in the meantime, this should get people
booting with witness and/or invariants again.

Footnote: amd64 doesn't have this problem because of the direct map
access method. I'd done all my witness/invariants testing there. I'd
never considered that the harmless-looking kmem_alloc/kmem_free calls
would cause such a problem and it didn't show up on the boot test.


# 158088 27-Apr-2006 alc

In general, bits in the page directory entry (PDE) and the page table
entry (PTE) have the same meaning. The exception to this rule is the
eighth bit (0x080). It is the PS bit in a PDE and the PAT bit in a
PTE. This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().

Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().

Reviewed by: jhb


# 158067 27-Apr-2006 delphij

Fix build on i386


# 158060 26-Apr-2006 peter

MFamd64: shrink pv entries from 24 bytes to about 12 bytes. (336 pv entries
per page = effectively 12.19 bytes per pv entry after overheads).
Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq
nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a
time per process. This provides 336 pv entries per process (actually, per
pmap address space) and eliminates one of the 8-byte tailq entries since
we now can track per-process pv entries implicitly. The pointer to
the pmap can be eliminated by doing address arithmetic to find the metadata
on the page headers to find a single pointer shared by all 336 entries.
There is an 11-int bitmap for the freelist of those 336 entries.

This is mostly a mechanical conversion from amd64, except:
* i386 has to allocate kvm and map the pages, amd64 has them outside of kvm
* native word size is smaller, so bitmaps etc become 32 bit instead of 64
* no dump_add_page() etc stuff because they are in kvm always.
* various pmap internals tweaks because pmap uses direct map on amd64 but
on i386 it has to use sched_pin and temporary mappings.

Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now
dynamic sysctls. Like on amd64, i386 can now tune the pv entry limits
without a recompile or reboot.

This is important because of the following scenario. If you have a 1GB
file (262144 pages) mmap()ed into 50 processes, that requires 13 million
pv entries. At 24 bytes per pv entry, that is 314MB of ram and kvm, while
at 12 bytes it is 157MB. A 157MB saving is significant.

Test-run by: scottl (Thanks!)


# 157680 12-Apr-2006 alc

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


# 157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


# 157394 02-Apr-2006 alc

Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry(). This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy(). The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated. Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days


# 156963 21-Mar-2006 alc

Eliminate unnecessary invalidations of the entire TLB by pmap_remove().
Specifically, on mappings with PG_G set pmap_remove() not only performs
the necessary per-page invlpg invalidations but also performs an
unnecessary invalidation of the entire set of non-PG_G entries.

Reviewed by: tegge


# 156930 21-Mar-2006 davidxu

Remove stale KSE code.

Reviewed by: alc


# 155771 16-Feb-2006 tegge

Rounding addr upwards to next 4M or 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reviewed by: alc


# 153141 05-Dec-2005 jhb

- Move the code to deal with handling an IPI_STOP IPI out of
ipi_nmi_handler() and into a new cpustop_handler() function. Change
the Xcpustop IPI_STOP handler to call this function instead of
duplicating all the same logic in assembly.
- EOI the local APIC for the lapic timer interrupt in C rather than
assembly.
- Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than
assembly.


# 152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


# 152404 13-Nov-2005 imp

Provide a dummy NO_XBOX option that lives in opt_xbox.h for pc98.
This allows us to eliminate a three ifdef PC98 instances.


# 152359 13-Nov-2005 alc

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


# 152238 09-Nov-2005 nyan

Fix pc98 build.


# 152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


# 152219 09-Nov-2005 imp

Add support for XBOX to the FreeBSD port. The xbox architecture is
nearly identical to wintel/ia32, with a couple of tweaks. Since it is
so similar to ia32, it is optionally added to a i386 kernel. This
port is preliminary, but seems to work well. Further improvements
will improve the interaction with syscons(4), port Linux nforce driver
and future versions of the xbox.

This supports the 64MB and 128MB boxes. You'll need the most recent
CVS version of Cromwell (the Linux BIOS for the XBOX) to boot.

Rink will be maintaining this port, and is interested in feedback.
He's setup a website http://xbox-bsd.nl to report the latest
developments.

Any silly mistakes are my fault.

Submitted by: Rink P.W. Springer rink at stack dot nl and
Ed Schouten ed at fxq dot nl


# 152042 04-Nov-2005 alc

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


# 151910 31-Oct-2005 alc

Instead of a panic()ing in pmap_insert_entry() if get_pv_entry()
fails, reclaim a pv entry by destroying a mapping to an inactive
page.

Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.


# 151889 30-Oct-2005 alc

Replace diagnostic printf()s by assertions. Use consistent style for
similar assertions.


# 151543 21-Oct-2005 ade

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


# 149786 04-Sep-2005 alc

Eliminate unnecessary TLB invalidations by pmap_enter(). Specifically,
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.

Reviewed by: tegge


# 149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


# 149532 27-Aug-2005 alc

MFamd64 revision 1.526
When pmap_allocpte() destroys a 2/4MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should.


# 149058 14-Aug-2005 alc

Simplify the page table page reference counting by pmap_enter()'s change of
mapping case.

Eliminate a stale comment from pmap_enter().

Reviewed by: tegge


# 148976 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Eliminate an unused #include. (Kernel stack allocation and deallocation
long ago migrated to the machine-independent code.)


# 148971 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Reviewed by: tegge


# 148952 11-Aug-2005 alc

Decouple the unrefing of a page table page from the removal of a pv entry.
In other words, change pmap_remove_entry() such that it no longer unrefs
the page table page. Now, it only removes the pv entry.

Reviewed by: tegge


# 148835 07-Aug-2005 alc

When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.

The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.


# 148539 29-Jul-2005 jhb

Fix a bug in pmap_protect() in the PAE case where it would try to look up
the vm_page_t associated with a pte using only the lower 32-bits of the pte
instead of the full 64-bits.

Submitted by: Greg Taleck greg at isilon dot com
Reviewed by: jeffr, alc
MFC after: 3 days


# 147741 02-Jul-2005 delphij

Remove the CPU_ENABLE_SSE option from the i386 and pc98 architectures,
as they are already default for I686_CPU for almost 3 years, and
CPU_DISABLE_SSE always disables it. On the other hand, CPU_ENABLE_SSE
does not work for I486_CPU and I586_CPU.

This commit has:
- Removed the option from conf/options.*
- Removed the option and comments from MD NOTES files
- Simplified the CPU_ENABLE_SSE ifdef's so they don't
deal with CPU_ENABLE_SSE from kernel configuration. (*)

For most users, this commit should be largely no-op. If you used to
place CPU_ENABLE_SSE into your kernel configuration for some reason,
it is time to remove it.

(*) The ifdef's of CPU_ENABLE_SSE are not removed at this point, since
we need to change it to !defined(CPU_DISABLE_SSE) && defined(I686_CPU),
not just !defined(CPU_DISABLE_SSE), if we really want to do so.

Discussed on: -arch
Approved by: re (scottl)


# 147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


# 141367 05-Feb-2005 alc

Implement proper handling of PG_G mappings in pmap_protect(). (I don't
believe that this omission mattered before the introduction of MemGuard.)

Reviewed by: tegge@
MFC after: 1 week


# 139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


# 138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


# 138504 07-Dec-2004 ups

Move reading the current CPU mask in pmap_lazyfix() to where the thread
is protected from migrating to another CPU.

Approved by: sam (mentor)
MFC after: 4 weeks


# 138303 02-Dec-2004 alc

For efficiency move the call to pmap_pte_quick() out of pmap_protect()'s
and pmap_remove()'s inner loop.

Reviewed by: peter@, tegge@


# 138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 137784 16-Nov-2004 jhb

Initiate deorbit burn sequence for 80386 support in FreeBSD: Remove
80386 (I386_CPU) support from the kernel.


# 137052 29-Oct-2004 alc

Implement per-CPU SYSMAPs, i.e., CADDR* and CMAP*, to reduce lock
contention within pmap_zero_page() and pmap_copy_page().


# 136252 08-Oct-2004 alc

Make pte_load_store() an atomic operation in all cases, not just i386 PAE.

Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852


# 136101 03-Oct-2004 alc

Undo revision 1.251. This change was a performance pessimizing work-around
that is no longer required. (In fact, it is not clear that it was ever
required in HEAD or RELENG_4, only RELENG_3 required a work-around.) Now,
as before revision 1.251, if the preexisting PTE is invalid, pmap_enter()
does not call pmap_invalidate_page() to update the TLB(s).

Note: Even with this change, the handling of a copy-on-write fault is
inefficient, in such cases pmap_enter() calls pmap_invalidate_page() twice.

Discussed with: bde@
PR: kern/16568


# 136070 02-Oct-2004 alc

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


# 136050 02-Oct-2004 alc

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


# 135939 29-Sep-2004 alc

Prevent the unexpected deallocation of a page table page while performing
pmap_copy(). This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable. (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep. This change
makes this scenario impossible.)

Reviewed by: tegge@


# 135565 22-Sep-2004 alc

Correct a long-standing error in _pmap_unwire_pte_hold() affecting
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.

See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.

Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.

MT5 Candidate


# 135479 19-Sep-2004 alc

Simplify the reference counting of page table pages. Specifically, use
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:

1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.

2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.

3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.

Reviewed by: tegge@


# 135452 19-Sep-2004 alc

Remove an outdated assertion from _pmap_allocpte(). (When vm_page_alloc()
succeeds, the page's queue field is unconditionally set to PQ_NONE by
vm_pageq_remove_nowakeup().)


# 135443 18-Sep-2004 alc

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


# 135123 12-Sep-2004 alc

Use an atomic op to update the pte in pmap_protect(). This is to prevent
the loss of a page modified (PG_M) bit in a race between processors.

Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.

Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.

In collaboration with: tegge@

MT5 candidate
PR: 61852


# 135076 11-Sep-2004 scottl

Revert the previous round of changes to td_pinned. The scheduler isn't
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic. Use ke_pinned instead as was originally
done with Tor's patch.

Approved by: julian


# 135056 10-Sep-2004 julian

Make up my mind if cpu pinning is stored in the thread structure or the
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.

Submitted by: tegge (of course!) (with changes)
MFC after: 3 days


# 134960 08-Sep-2004 alc

Use atomic ops in pmap_clear_ptes() to prevent SMP races that could
result in the loss of an accessed or modified bit from the pte.

In collaboration with: tegge@

MT5 candidate


# 134611 01-Sep-2004 alc

Correction to the previous revision: I forgot to apply the ones complement
to a constant. This didn't show in testing because the broken expression
produced the same result in my tests as the correct expression.


# 134607 01-Sep-2004 alc

Modify pmap_pte() to support its use on non-current, non-kernel pmaps
without holding Giant.


# 134509 30-Aug-2004 alc

Remove unnecessary check for curthread == NULL.


# 134416 27-Aug-2004 obrien

s/smp_rv_mtx/smp_ipi_mtx/g

Requested by: jhb


# 134393 27-Aug-2004 alc

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


# 134227 23-Aug-2004 peter

Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by: jhb


# 133292 07-Aug-2004 alc

With the advent of pmap locking it makes sense for pmap_copy() to be less
forgiving about inconsistencies in the source pmap. Also, remove a new-
line character terminating a nearby panic string.


# 133139 04-Aug-2004 jhb

Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other. With this change, only
one of the CPUs would do an IPI and spin-wait at a time.


# 133124 04-Aug-2004 alc

Post-locking clean up/simplification, particularly, the elimination of
vm_page_sleep_if_busy() and the page table page's busy flag as a
synchronization mechanism on page table pages.

Also, relocate the inline pmap_unwire_pte_hold() so that it can be used
to shorten _pmap_unwire_pte_hold() on alpha and amd64. This places
pmap_unwire_pte_hold() next to a comment that more accurately describes
it than _pmap_unwire_pte_hold().


# 132919 31-Jul-2004 alc

Add pmap locking to pmap_object_init_pt().


# 132852 29-Jul-2004 alc

Advance the state of pmap locking on alpha, amd64, and i386.

- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)


# 132505 21-Jul-2004 cognet

Using NULL as a malloc type when calling contigmalloc() is wrong, so introduce
a new malloc type, and use it.


# 132365 18-Jul-2004 alc

Utilize pmap_pte_quick() rather than pmap_pte() in pmap_protect(). The
reason being that pmap_pte_quick() requires the page queues lock, which is
already held, rather than Giant.


# 132318 17-Jul-2004 alc

Remedy my omission of one change in the prevision revision: pmap_remove()
must pin the current thread in order to call pmap_pte_quick().


# 132316 17-Jul-2004 alc

- Utilize pmap_pte_quick() rather than pmap_pte() in pmap_remove() and
pmap_remove_page(). The reason being that pmap_pte_quick() requires
the page queues lock, which is already held, rather than Giant.
- Assert that the page queues lock is held in pmap_remove_page() and
pmap_remove_pte().


# 132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


# 132082 13-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


# 131744 07-Jul-2004 alc

Simplify the control flow in pmap_extract(), enabling the elimination of a
PMAP_UNLOCK() call.


# 131730 07-Jul-2004 alc

White space and style changes only.


# 131666 06-Jul-2004 alc

Style changes to pmap_extract().


# 131301 29-Jun-2004 peter

Fix leftover argument to pmap_unuse_pt(). I committed the wrong diff.

Submmitted by: Jon Noack <noackjr@alumni.rice.edu>


# 131272 29-Jun-2004 peter

Reduce the size of pv entries by 15%. This saves 1MB of KVA for mapping
pv entries per 1GB of user virtual memory. (eg: if we had 1GB file was
mmaped into 30 processes, that would theoretically reduce the KVA demand by
30MB for pv entries. In reality though, we limit pv entries so we don't
have that many at once.)

We used to store the vm_page_t for the page table page. But we recently
had the pa of the ptp, or can calculate it fairly quickly. If we wanted
to avoid the shift/mask operation in pmap_pde(), we could recover the
pa but that means we have to store it for a while.

This does not measurably change performance.

Suggested by: alc
Tested by: alc


# 131150 26-Jun-2004 alc

In case pmap_extract_and_hold() is ever performed on a different pmap than
the current one, we need to pin the current thread to its CPU.

Submitted by: tegge@


# 130932 22-Jun-2004 alc

Implement the protection check required by the pmap_extract_and_hold()
specification. This enables the elimination of Giant from that function.

Reviewed by: tegge@


# 130814 20-Jun-2004 alc

- Simplify pmap_remove_pages(), eliminating unnecessary indirection.
- Simplify the locking of pmap_is_modified() by converting control flow to
data flow.


# 130765 20-Jun-2004 alc

Add pmap locking to pmap_is_prefaultable().


# 130738 19-Jun-2004 alc

Remove unused pt_entry_ts. Remove an unneeded semicolon.


# 130626 17-Jun-2004 alc

Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object. Thus, PG_BUSY serves no purpose.


# 130573 16-Jun-2004 alc

MFamd64
Introduce pmap locking to many of the pmap functions.


# 130560 16-Jun-2004 alc

MFamd64
Remove dead or unneeded code, e.g., spl calls.


# 130539 15-Jun-2004 alc

Remove a stale comment.


# 130433 13-Jun-2004 alc

Prevent the loss of a PG_M bit through an SMP race in pmap_ts_referenced().


# 130386 12-Jun-2004 alc

In a multiprocessor, the PG_W bit in the pte must be changed atomically.
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.

Reviewed by: tegge@


# 129817 28-May-2004 alc

Remove a broken micro-optimization from pmap_enter(). The ill effect
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.

Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.

For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.

Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week


# 128098 10-Apr-2004 alc

- pmap_kenter_temporary()'s first parameter, which is a physical address,
should be declared as vm_paddr_t not vm_offset_t.


# 127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


# 127869 04-Apr-2004 alc

Remove unused arguments from pmap_init().


# 126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


# 125308 01-Feb-2004 alc

Eliminate all TLB shootdowns by pmap_pte_quick(): By temporarily pinning
the thread that calls pmap_pte_quick() and by virtue of the page queues
lock being held, we can manage PADDR1/PMAP1 as a CPU private mapping.

The most common effect of this change is to reduce the overhead of the page
daemon on multiprocessors.

In collaboration with: tegge


# 124956 25-Jan-2004 jeff

- Now that both schedulers support temporary cpu pinning use this rather
than the switchin functions to guarantee that we're operating with the
correct tlb entry.
- Remove the post copy/zero tlb invalidations. It is faster to invalidate
an entry that is known to exist and so it is faster to invalidate after
use. However, some architectures implement speculative page table
prefetching so we can not be guaranteed that the invalidated entry is still
invalid when we re-enter any of these functions. As a result of this we
must always invalidate before use to be safe.


# 124360 11-Jan-2004 alc

Include "opt_cpu.h" and related #ifdef's for SSE so that pagezero()
actually includes the call to sse2_pagezero().


# 123945 28-Dec-2003 alc

Don't bother clearing PG_ZERO on the page table page in _pmap_allocpte();
it serves no purpose.


# 123925 28-Dec-2003 alc

Don't bother clearing and setting PG_BUSY on page table directory pages.


# 123710 21-Dec-2003 alc

- Significantly reduce the number of preallocated pv entries in
pmap_init(). Such a large preallocation is unnecessary and wastes
nearly eight megabytes of kernel virtual address space per gigabyte
of managed physical memory.
- Increase UMA_BOOT_PAGES by two. This enables the removal of
pmap_pv_allocf(). (Note: this function was only used during
initialization, specifically, after pmap_init() but before
pmap_init2(). During pmap_init2(), a new allocator is installed.)


# 123650 18-Dec-2003 jhb

MFamd64: Remove i386_protection_init() and the protection_codes[] array
and replace them with a simple if test to turn on PG_RW. i386 != vax.


# 122284 08-Nov-2003 alc

- Similar to post-PAE RELENG_4 split pmap_pte_quick() into two cases,
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().

In collaboration with: tegge


# 121996 03-Nov-2003 jhb

New i386 SMP code:

- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.


# 121823 31-Oct-2003 jhb

For physical address regions between 0 and KERNLOAD, allow pmap_mapdev()
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.

Reviewed by: peter


# 121758 30-Oct-2003 peter

Change the pmap_invalidate_xxx() functions so they test against
pmap == kernel_pmap rather than pmap->pm_active == -1. gcc's inliner
can remove more code that way. Only kernel_pmap has a pm_active of -1.


# 121621 27-Oct-2003 jhb

Fix pmap_unmapdev() to call pmap_kremove() instead of implementing it
directly so that it more closely mirrors pmap_mapdev() which calls
pmap_kenter().


# 121512 25-Oct-2003 peter

For the SMP case, flush the TLB at the beginning of the page zero/copy
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.


# 121494 25-Oct-2003 peter

GC workaround code for detecting pentium4's and disabling PSE and PG_G.
It's been ifdef'ed out for ages.


# 121102 14-Oct-2003 peter

Get some more data if we hit the pmap_enter() thing.


# 121055 13-Oct-2003 alc

- Modify pmap_is_current() to return FALSE when a pmap's page table is in
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.

In collaboration with: tegge


# 121024 12-Oct-2003 phk

Initialize CMAP3 to 0


# 120772 04-Oct-2003 alc

Don't bother setting a page table page's valid field. It is unused and
not setting it is consistent with other uses of VM_ALLOC_NOOBJ pages.


# 120734 04-Oct-2003 jeff

- The proper test is CPU_ENABLE_SSE and not CPU_ENABLED_SSE. This
effectively disabled the sse2_pagezero() code.

Spotted by: bde


# 120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


# 120654 01-Oct-2003 peter

Commit Bosko's patch to clean up the PSE/PG_G initialization to and
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.

For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.

A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.

Obtained from: bmilekic


# 120623 01-Oct-2003 jeff

- Hide more #ifdef logic in a new invlcaddr inline. This function flushes
the full tlb if you're on an I386or does an invlpg otherwise.

Glanced at by: peter


# 120621 01-Oct-2003 jeff

- Define an inline pagezero() to select the appropriate full-page zeroing
function from one of bzero, i686_pagezero, or sse2_pagezero.
- Use pagezero() in the three pmap functions that need to zero full pages.


# 120613 30-Sep-2003 jeff

- Correct a problem with the last commit. The CMAP ptes need to be zeroed
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.

Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot


# 120602 30-Sep-2003 jeff

- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.


# 120499 27-Sep-2003 alc

Addendum to the previous revision: If vm_page_alloc() for the page
table page fails, perform a VM_WAIT; update some comments in
_pmap_allocpte().


# 120424 25-Sep-2003 alc

- Eliminate the pte object.
- Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
KVA space for the page directory page(s). Submitted by: tegge


# 120322 21-Sep-2003 alc

Allocate the page table directory page(s) as "no object" pages. (This
leaves one explicit use of the pte object.)


# 120307 20-Sep-2003 alc

Reimplement pmap_release() such that it uses the page table rather than the
pte object to locate the page table directory pages. (This is another step
toward the elimination of the pte object.)


# 120040 13-Sep-2003 alc

Simplify (and micro-optimize) pmap_unuse_pt(): Only one caller,
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().


# 119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


# 119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


# 119452 25-Aug-2003 obrien

Fix copyright comment & FBSDID style nits.

Requested by: bde


# 119276 22-Aug-2003 alc

Eliminate the last (direct) use of vm_page_lookup() on the pte object.


# 119139 19-Aug-2003 alc

Eliminate a possible race condition for multithreaded applications in
_pmap_allocpte(): Guarantee that the page table page is zero filled before
adding it to the directory. Otherwise, a 2nd, 3rd, etc. thread could
access a nearby virtual address and use garbage for the address
translation.

Discussed with: peter, tegge


# 119054 17-Aug-2003 alc

Acquire the pte object's mutex when performing vm_page_grab(). Note: It is
my long-term objective to eliminate the pte object. In the near term, this
does, however, enable the addition of some vm object locking assertions.


# 119006 17-Aug-2003 alc

In pmap_copy(), since we have the page table page's physical address
in hand, use PHYS_TO_VM_PAGE() rather than vm_page_lookup().


# 118894 14-Aug-2003 alc

Eliminate pmap_page_lookup() and its uses. Instead, use PHYS_TO_VM_PAGE()
to convert the pte's physical address into a vm page.

Reviewed by: peter


# 118741 10-Aug-2003 alc

Rename pmap_changebit() to pmap_clear_ptes() and remove the last
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.

Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.


# 118562 06-Aug-2003 alc

Correct a mistake in the previous revision: Reduce the scope of the page
queues lock such that it isn't held around the call to get_pv_entry(),
which calls uma_zalloc(). At the point of the call to get_pv_entry(), the
lock isn't necessary and holding it could lead to recursive acquisition,
which isn't allowed.


# 118561 06-Aug-2003 alc

Acquire the page queues lock in pmap_insert_entry(). (I used to believe
that the page's busy flag could be relied upon to synchronize access to the
pv list. I don't any longer. See, for example, the call to
pmap_insert_entry() from pmap_copy().)


# 118344 02-Aug-2003 alc

- Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in
pmap_mapdev(). See revision 1.140 of kern/sys_pipe.c for a detailed
rationale. Submitted by: tegge
- Remove GIANT_REQUIRED from pmap_mapdev().


# 118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


# 117929 23-Jul-2003 alc

Annotate pmap_changebit() as __always_inline. This function was
written as a template that when inlined is specialized for the caller
through constant value propagation and dead code elimination. Thus,
the specialized code that is generated for pmap_clear_reference() et
al. avoids several conditional branches inside of a loop.


# 117372 09-Jul-2003 peter

unifdef -DLAZY_SWITCH and start to tidy up the associated glue.


# 117340 08-Jul-2003 alc

In pmap_object_init_pt(), the pmap_invalidate_all() should be performed on
the caller-provided pmap, not the kernel_pmap. Using the kernel_pmap
results in an unnecessary IPI for TLB shootdown on SMPs.

Reviewed by: jake, peter


# 117242 04-Jul-2003 alc

Add vm object locking to pmap_prefault().


# 117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


# 117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


# 116930 27-Jun-2003 peter

Tidy up leftover lazy_switch instrumentation that is no longer needed.
This cleans up some #ifdef hell.


# 116926 27-Jun-2003 peter

Fix the false IPIs on smp when using LAZY_SWITCH caused by pmap_activate()
not releasing the pm_active bit in the old pmap.


# 116361 14-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


# 116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


# 116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


# 116304 13-Jun-2003 alc

Add vm object locking to pmap_object_init_pt().


# 116279 13-Jun-2003 alc

Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.


# 115683 02-Jun-2003 obrien

Use __FBSDID().


# 114177 28-Apr-2003 jake

Use inlines for loading and storing page table entries. Use cmpxchg8b for
the PAE case to ensure idempotent 64 bit loads and stores.

Sponsored by: DARPA, Network Associates Laboratories


# 114013 25-Apr-2003 jake

Remove harmless invalid cast.

Sponsored by: DARPA, Network Associates Laboratories


# 113040 03-Apr-2003 jake

- Removed APTD and associated macros, it is no longer used.

BANG BANG BANG etc.

Sponsored by: DARPA, Network Associates Laboratories


# 112993 02-Apr-2003 peter

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


# 112841 30-Mar-2003 jake

- Add support for PAE and more than 4 gigs of ram on x86, dependent on the
kernel opition 'options PAE'. This will only work with device drivers which
either use busdma, or are able to handle 64 bit physical addresses.

Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine
with 6 gigs of ram.

Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems


# 112837 29-Mar-2003 jake

- Remove invalid casts.

Sponsored by: DARPA, Network Associates Laboratories


# 112836 29-Mar-2003 jake

- Convert all uses of pmap_pte and get_ptbase to pmap_pte_quick. When
accessing an alternate address space this causes 1 page table page at
a time to be mapped in, rather than using the recursive mapping technique
to map in an entire alternate address space. The recursive mapping
technique changes large portions of the address space and requires global
tlb flushes, which seem to cause problems when PAE is enabled. This will
also allow IPIs to be avoided when mapping in new page table pages using
the same technique as is used for pmap_copy_page and pmap_zero_page.

Sponsored by: DARPA, Network Associates Laboratories


# 112569 24-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


# 112133 12-Mar-2003 jake

- Added support for multiple page directory pages to pmap_pinit and
pmap_release.
- Merged pmap_release and pmap_release_free_page. When pmap_release is
called only the page directory page(s) can be left in the pmap pte object,
since all page table pages will have been freed by pmap_remove_pages and
pmap_remove. In addition, there can only be one reference to the pmap and
the page directory is wired, so the page(s) can never be busy. So all there
is to do is clear the magic mappings from the page directory and free the
page(s).

Sponsored by: DARPA, Network Associates Laboratories


# 111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


# 111493 25-Feb-2003 jake

- Added inlines pmap_is_current, pmap_is_alternate and pmap_set_alternate
for testing and setting the current and alternate address spaces.
- Changed PTDpde and APTDpde to arrays to support multiple page directory
pages.

ponsored by: DARPA, Network Associates Laboratories


# 111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


# 111385 23-Feb-2003 jake

Use the direct mapping of IdlePTD setup in locore for proc0's page directory,
instead of allocating another page of kva and mapping it in again. This was
likely an oversight in revision 1.174 (cut and paste from pmap_pinit).

Discussed with: peter, tegge
Sponsored by: DARPA, Network Associates Laboratories


# 111363 23-Feb-2003 jake

- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
There are still numerous assumptions that the page directory is exactly
1 page.

Sponsored by: DARPA, Network Associates Laboratories


# 111299 23-Feb-2003 jake

- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants
in locore.
- Removed the macros PTESIZE and PDESIZE, use sizeof instead in C.

Sponsored by: DARPA, Network Associates Laboratories


# 111272 22-Feb-2003 alc

The root of the splay tree maintained within the pm_pteobj always refers
to the last accessed pte page. Thus, the pm_ptphint is redundant and can
be removed.


# 110955 15-Feb-2003 alc

Assert that the kernel map's system mutex is held in pmap_growkernel().


# 110845 14-Feb-2003 alc

- Add a mutex for synchronizing the use of CMAP/CADDR 1 and 2.
- Eliminate small style differences between pmap_zero_page(),
pmap_copy_page(), etc.


# 110781 13-Feb-2003 peter

Oops. I mis-remembered about the P4 problems. It was 5.0-DP2 that
was shipped with DISABLE_PG_G and DISABLE_PSE, not 5.0-REL. *blush*
Disable the code - but still leave it there in case its still lurking.


# 110780 12-Feb-2003 peter

Turn of PG_PS and PG_G for Pentium-4 cpus at boot time. This is so
that we can stop turning off PG_G and PG_PS globally for releases.


# 110747 12-Feb-2003 alc

Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


# 110532 08-Feb-2003 alc

MF alpha
- Synchronize access to the allpmaps list with a mutex.


# 110480 06-Feb-2003 peter

Commit some cosmetic changes I had laying around and almost included
with another commit. Unwrap a line. Unexpand a pmap_kenter().


# 110254 02-Feb-2003 alc

- Make allpmaps static.
- Use atomic subtract to update the global wired pages count. (See
also vm/vm_page.c revision 1.233.)
- Assert that the page queue lock is held in pmap_remove_entry().


# 109964 28-Jan-2003 alc

Merge pmap_testbit() and pmap_is_modified(). The latter is the only caller
of the former.


# 108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


# 108337 27-Dec-2002 alc

Assert that the page queues lock is held in pmap_testbit().


# 108253 24-Dec-2002 alc

- Hold the page queues lock around calls to vm_page_wakeup() and
vm_page_flag_clear().


# 107853 14-Dec-2002 alc

Add page locking to pmap_mincore().

Submitted (in part) by: tjr@


# 107538 03-Dec-2002 alc

Avoid recursive acquisition of the page queues lock in pmap_unuse_pt().

Approved by: re


# 107490 02-Dec-2002 alc

Hold the page queues lock when calling pmap_unwire_pte_hold() or
pmap_remove_pte(). Use vm_page_sleep_if_busy() in
_pmap_unwire_pte_hold() so that the page queues lock is released
when sleeping.

Approved by: re (blanket)


# 107434 30-Nov-2002 alc

Assert that the page queues lock is held in pmap_changebit()
and pmap_ts_referenced().

Approved by: re (blanket)


# 107410 30-Nov-2002 alc

Assert that the page queues lock is held in pmap_page_exists_quick().

Approved by: re (blanket)


# 107217 25-Nov-2002 alc

Assert that the page queues lock is held in pmap_remove_pages().

Approved by: re (blanket)


# 107184 23-Nov-2002 alc

- Assert that the page queues lock is held in pmap_remove_all().
- Fix a diagnostic message and comment in pmap_remove_all().
- Eliminate excessive white space from pmap_remove_all().

Approved by: re


# 106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


# 106753 11-Nov-2002 alc

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


# 106567 07-Nov-2002 alc

Simplify and optimize pmap_object_init_pt(). More specifically,
take advantage of the fact that the vm object's list of pages is
now ordered to reduce the overhead of finding the desired set of
pages to be mapped. (See revision 1.215 of vm/vm_page.c.)


# 104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


# 104319 01-Oct-2002 phk

The pmap_prefault_pageorder[] array was initialize with wrong values
due to a missing comma.

I have no idea what trouble, if any, this may have caused.

Pointed out by: FlexeLint


# 104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


# 102241 21-Aug-2002 archie

Don't use "NULL" when "0" is really meant.


# 102041 18-Aug-2002 alc

o Simplify the ptphint test in pmap_release_free_page(). In other words,
make it just like the test in _pmap_unwire_pte_hold().


# 101770 13-Aug-2002 alc

o Remove an unnecessary vm_page_flash() from _pmap_unwire_pte_hold().

Reviewed by: peter


# 101751 12-Aug-2002 alc

o Convert three instances of vm_page_sleep_busy() into vm_page_sleep_if_busy()
with page queue locking.


# 101721 12-Aug-2002 iedowse

Use roundup2() to avoid a problem where pmap_growkernel was unable
to extend the kernel VM to the maximum possible address of 4G-4M.

PR: i386/22441
Submitted by: Bill Carpenter <carp@world.std.com>
Reviewed by: alc


# 101635 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)


# 101352 05-Aug-2002 peter

Revert rev 1.356 and 1.352 (pmap_mapdev hacks). It wasn't worth the
pain.


# 101322 04-Aug-2002 peter

Fix a mistake in 1.352 - I was returning a pointer to the rounded down
address. I expect this will fix acpica.


# 101294 04-Aug-2002 alc

o Request a wired page from vm_page_grab() in _pmap_allocpte().


# 101280 03-Aug-2002 alc

o Ask for a prezeroed page in pmap_pinit() for the page directory page.


# 101254 03-Aug-2002 alc

o Don't set PG_MAPPED on the page allocated and mapped in _pmap_allocpte().
(Only set this flag if the mapping has a corresponding pv list entry,
which this mapping doesn't.)


# 101249 02-Aug-2002 peter

Take advantage of the fact that there is a small 1MB direct mapped region
on x86 in between KERNBASE and the kernel load address. pmap_mapdev()
can return pointers to this for devices operating in the isa "hole".


# 101197 02-Aug-2002 alc

o Lock page queue accesses by vm_page_deactivate().


# 101105 31-Jul-2002 alc

o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped
by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.

Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.


# 100912 30-Jul-2002 alc

o Lock page queue accesses by pmap_release_free_page().


# 100862 29-Jul-2002 alc

o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire()
in pmap_new_thread(), pmap_pinit(), and vm_proc_new().
o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().


# 100432 21-Jul-2002 peter

Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat
gathering is not an x86 cpu feature)


# 100378 19-Jul-2002 alc

o Use vm_page_alloc(... | VM_ALLOC_WIRED) in place of vm_page_wire().


# 100264 17-Jul-2002 peter

Avoid trying to set PG_G on the first 4MB when we set up the 4MB page.
This solves the SMP panic for at least one system. I'd still like to know
why my xeon works though.

Tested by: bmilekic


# 99987 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


# 99931 13-Jul-2002 peter

Two invlpg's slipped through that were not protected from I386_CPU

Pointed out by: dillon


# 99930 13-Jul-2002 peter

invlpg() does not work too well on i386 cpus. Add token i386 support
back in to the pmap_zero_page* stuff.


# 99929 13-Jul-2002 peter

Do global shootdowns when switching to/from 4MB pages. I believe we can
do a shootdown on a 4MB "page" though, but this should be safer for now.

Noticed by: tegge


# 99928 13-Jul-2002 peter

Bandaid for SMP. Changing APTDpde without a global shootdown is not
safe yet. We used to do a global shootdown here anyway so another day
or so shouldn't hurt.


# 99925 13-Jul-2002 alc

o Lock some page queue accesses, in particular, those by vm_page_unwire().


# 99890 12-Jul-2002 dillon

Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit

Reviewed by: peter, jhb
Approved by: peter


# 99862 12-Jul-2002 peter

Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.

New option: DISABLE_PG_G - In case I missed something.


# 99852 12-Jul-2002 peter

Unexpand a couple of 8-space indents that I added in rev 1.285.


# 99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


# 99561 07-Jul-2002 peter

Fix a hideous TLB bug. pmap_unmapdev neglected to remove the device
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.


# 99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


# 99415 04-Jul-2002 peter

Diff reduction (microoptimization) with another WIP. Move the frame
calculation in get_ptbase() to a little later on.


# 99398 03-Jul-2002 julian

Don't free pages we never allocated..

My eyes openned by: Matt


# 99397 03-Jul-2002 julian

Slight restatement of the code and remove some unused variables.


# 99381 03-Jul-2002 julian

Add comments and slightly rearrange the thread stack assignment code
to try make it less obscure.


# 99380 03-Jul-2002 julian

Remove vestiges of old code...
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 98902 27-Jun-2002 arr

Fix for the problem stated below by Tor Egge:
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)

"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."

Submitted by: tegge
Approved by: tegge's patches never fail


# 98892 26-Jun-2002 iedowse

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


# 98824 25-Jun-2002 iedowse

Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().


# 98361 17-Jun-2002 jeff

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


# 95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


# 94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


# 92846 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


# 92770 20-Mar-2002 alfred

Remove __P.


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 91471 28-Feb-2002 silby

Fix a minor swap leak.

Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.

Submitted by: dillon
Reviewed by: silby
MFC after: 1 week


# 91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


# 91367 27-Feb-2002 peter

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


# 91358 27-Feb-2002 peter

Bandaid for the Uniprocessor kernel exploding. This makes a UP kernel
boot and run (and indeed I am committing from it) instead of exploding
during the int 0x15 call from inside the atkbd driver to get the keyboard
repeat rates.


# 91353 27-Feb-2002 alfred

clarify panic message


# 91344 27-Feb-2002 peter

Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.

Obtained from: jake (MI code, suggestions for MD part)


# 91341 26-Feb-2002 dillon

didn't quite undo the last reversion. This gets it.


# 91329 26-Feb-2002 dillon

revert compatibility fix temporarily (thought it would not break anything
leaving it in).


# 91322 26-Feb-2002 dillon

Make peter's commit compatible with interrupt-enabled critical_enter()
and exit(), which has already solved the problem in regards to deadlocked
IPI's.


# 91260 25-Feb-2002 peter

Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.


# 90998 20-Feb-2002 peter

Pass me the pointy hat please. Be sure to return a value in a non-void
function. I've been running with this buried in the mountains of compiler
output for about a month on my desktop.


# 90947 19-Feb-2002 peter

Some more tidy-up of stray "unsigned" variables instead of p[dt]_entry_t
etc.


# 88903 05-Jan-2002 peter

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


# 88838 02-Jan-2002 peter

Allow a specific setting for pv entries. This avoids the need to guess
(or calculate by hand) the effect of interactions between shpgperproc,
physical ram size, maxproc, maxdsiz, etc.


# 88744 31-Dec-2001 dillon

Grrr. The tlb code is strewn over 3 files and I misread it. Revert
the last change (it was a NOP), and remove the XXX comments that no longer
apply.


# 88742 31-Dec-2001 dillon

You know those 'XXX what about SMP' comments in pmap_kenter()? Well,
they were right. Fix both kenter() and kremove() for SMP by ensuring that
the tlb is flushed on other cpu's. This will directly solve random-corruption
panic issues in -stable when it is MFC'd. Better to be safe then sorry, we
can optimize this later.

Original Suspicion by: peter
Maybe MFC: immediately on re's permission


# 88245 20-Dec-2001 peter

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


# 88240 20-Dec-2001 peter

Fix some whitespace nits, and a minor error that I made in some unused
#ifdef DEBUG code (VM_MAXUSER_ADDRESS vs UPT_MAX_ADDRESS).


# 87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


# 87637 10-Dec-2001 peter

Delete some leftover code from a bygone age. We dont have an array of
IdlePTDS anymore and dont to the PTD[MPPTDI] swapping etc.


# 86486 16-Nov-2001 peter

Fix the non-KSTACK_GUARD case.. It has been broken since the KSE
commit. ptek was not been initialized.


# 86485 16-Nov-2001 peter

Start bringing i386/pmap.c into line with cleanups that were done to
alpha pmap. In particular -
- pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer.
This is to enable cleaner PAE and x86-64 support down the track sor
that we can change the pd_entry_t/pt_entry_t types to 64 bit entities.
- Terminate "unsigned *ptep, pte" with extreme prejudice and use the
correct pt_entry_t/pd_entry_t types.
- Various other cosmetic changes to match cleanups elsewhere.
- This eliminates a boatload of casts.
- use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places
where we're testing user address space limits. Assuming the page tables
start directly after the end of user space is not a safe assumption.
There is still more to go.


# 86443 16-Nov-2001 peter

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


# 86439 16-Nov-2001 peter

Converge/fix some debug code (#if 0'ed on alpha, but whatever)
- use NPTEPG/NPDEPG instead of magic 1024 (important for PAE)
- use pt_entry_t instead of unsigned (important for PAE)
- use vm_offset_t instead of unsigned for va's (important for x86-64)


# 85806 01-Nov-2001 peter

Skip PG_UNMANAGED pages when we're shooting everything down to try and
reclaim pv_entries. PG_UNMANAGED pages dont have pv_entries to reclaim.

Reported by: David Xu <davidx@viasoft.com.cn>


# 85762 31-Oct-2001 dillon

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


# 84934 14-Oct-2001 tegge

Reduce the number of TLB shootdowns caused by a call to pmap_qenter()
from number of pages mapped to 1.

Reviewed by: dillon


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82624 31-Aug-2001 peter

Do a style cleanup pass for the pmap_{new,dispose,etc}_proc() functions
to get them closer to the KSE tree. I will do the other $machine/pmap.c
files shortly.


# 82310 25-Aug-2001 julian

Add another comment.
check for 'teh's this time..


# 82309 25-Aug-2001 peter

Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.


# 82127 22-Aug-2001 dillon

Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by: freebsd-smp, jake


# 82121 21-Aug-2001 peter

Introduce two new sysctl's.. vm.kvm_size and vm.kvm_free. These are
purely informational and can give some advance indications of tuning
problems. These are i386 only for now as it seems that the i386 is
the only one suffering kvm pressure.


# 80431 26-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 77082 23-May-2001 alfred

pmap_mapdev needs the vm_mtx, aquire it if not already locked


# 76941 21-May-2001 jhb

Sort includes.


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 74283 15-Mar-2001 peter

Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section. If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping. This would cause the trampoline
up to high memory to fault. The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>


# 73936 07-Mar-2001 jhb

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


# 73903 06-Mar-2001 jhb

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


# 73862 06-Mar-2001 jhb

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


# 72930 22-Feb-2001 peter

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


# 71816 29-Jan-2001 jhb

Remove unnecessary locking to protect the p_upages_obj and p_addr
pointers.


# 71526 24-Jan-2001 jhb

- Proc locking.
- P_INMEM -> PS_INMEM.


# 71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


# 71318 21-Jan-2001 jake

Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.


# 71098 16-Jan-2001 peter

Stop doing runtime checking on i386 cpus for cpu class. The cpu is
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.

Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)


# 70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


# 69947 12-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 68441 07-Nov-2000 alfred

Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.

Found by: Joost Pol aka Nohican <nohican@marcella.niets.org>
Submitted by: tegge


# 67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


# 66696 05-Oct-2000 jhb

Replace loadandclear() with atomic_readandclear_int().


# 66277 22-Sep-2000 ps

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


# 66069 19-Sep-2000 eivind

Better error message when booting an SMP kernel on an UP system.


# 65557 06-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 64728 16-Aug-2000 tegge

Prepare for a cleanup of pmap module API pollution introduced by the
suggested fix in PR 12378.

Keep track of all existing pmaps independent of existing processes.

This allows for a process to temporarily connect to a different address
space without the risk of missing an update of the original address space if
the kernel grows.

pmap_pinit2() is no longer needed on the i386 platform but is left as a
stub until the alpha pmap code is updated.

PR: 12378


# 61081 29-May-2000 dillon

This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it. It implements a new
PG_UNMANAGED flag that has slightly different characteristics
from PG_FICTICIOUS.

A new sysctl, kern.ipc.shm_use_phys has been added to enable the
use of physically-backed sysv shared memory rather then swap-backed.
Physically backed shm segments are not tracked with PV entries,
allowing programs which use a large shm segment as a rendezvous
point to operate without eating an insane amount of KVM in the
PV entry management. Read: Oracle.

Peter's OBJT_PHYS object will also allow us to eventually implement
page-table sharing and/or 4MB physical page support for such segments.
We're half way there.


# 61074 29-May-2000 dfr

Brucify the pmap_enter_temporary() changes.


# 61036 28-May-2000 dfr

Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon


# 60755 21-May-2000 peter

Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by: dillon


# 59440 20-Apr-2000 luoqi

IO apics are not necessarily page aligned, they are only required to be aligned
on 1K boundary. Correct a typo that would cause problem to a second IO apic.

Pointed out by: Steve Passe <smp.csn.net>


# 58132 16-Mar-2000 phk

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


# 57996 13-Mar-2000 bde

Disabled the optimization of not doing an invltlb_1pg() when changing
pte's from zero. The TLB is supposed to be invalidated when pte's are
changed _to_ zero, but this doesn't occur in all cases for global pages
(PG_G stops invltlb() from working, and invltlb_1pg() is not used
enough).

PR: 14141, 16568
Submitted by: dillon


# 53433 19-Nov-1999 phk

Use LIST_FOREACH to traverse the allproc list.

Submitted by: Jake Burkholder jake@checker.org


# 52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 51183 11-Sep-1999 peter

Make pmap_mapdev() deal with non-page-aligned requests.
Add a corresponding pmap_unmapdev() to release the KVM back to kernel_map.


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 49635 11-Aug-1999 alc

_pmap_allocpte:
If the pte page isn't PQ_NONE, panic rather than silently
covering up the problem.


# 49591 10-Aug-1999 alc

pmap_remove_pages:
Add KASSERT to detect out of range access to the pv_table and
report the errant pte before it's overwritten.


# 49337 31-Jul-1999 alc

pmap_object_init_pt:
Verify that object != NULL.


# 49304 31-Jul-1999 alc

Add parentheses for clarity.

Submitted by: dillon


# 48963 21-Jul-1999 alc

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


# 48677 08-Jul-1999 mckusick

These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.

* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.

* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.

* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.

* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c

* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.

* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.

* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.

* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.

* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).

* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).

* removal of extra arguments passed to getnewbuf() that are not
necessary.

* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c

* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.

* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.

Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


# 48144 23-Jun-1999 luoqi

Do not setup 4M pdir until all APs are up.


# 47842 08-Jun-1999 dt

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
kernel virtual address space for UPAGES.


# 47763 05-Jun-1999 luoqi

Fix an accounting problem when prefaulting 4M pages.

PR: kern/11948


# 47678 01-Jun-1999 jlemon

Unifdef VM86.

Reviewed by: silence on on -current


# 47575 28-May-1999 alc

pmap_object_init_pt:
The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.


# 47292 18-May-1999 alc

pmap_qremove:
Eliminate unnecessary TLB shootdowns.


# 46129 27-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 46072 25-Apr-1999 alc

pmap_dispose_proc and pmap_copy_page:
Conditionally compile 386-specific code.

pmap_enter:
Eliminate unnecessary TLB shootdowns.

pmap_zero_page and pmap_zero_page_area:
Use invltlb_1pg instead of duplicating the code.


# 45960 23-Apr-1999 dt

Make pmap_collect() an official pmap interface.


# 45833 19-Apr-1999 alc

_pmap_unwire_pte_hold and pmap_remove_page:
Use pmap_TLB_invalidate instead of invltlb_1pg to eliminate
unnecessary IPIs.

pmap_remove, pmap_protect and pmap_remove_pages:
Use pmap_TLB_invalidate_all instead of invltlb to eliminate
unnecessary IPIs.

pmap_copy:
Use cpu_invltlb instead of invltlb when updating APTDpde.

pmap_changebit:
Rather than deleting the unused "set bit" option (which may be
useful later), make pmap_changebit an inline that is used
by the new pmap_clearbit procedure.

Collectively, the first three changes reduce the number of TLB shootdown
IPIs by 1/3 for a kernel compile.


# 45524 10-Apr-1999 alc

pmap_remove_pte:
Use "loadandclear" to update the pte.

pmap_changebit and pmap_ts_referenced:
Switch to pmap_TLB_invalidate from invltlb.


# 45405 07-Apr-1999 msmith

mem.c
Split out ioctl handler a little more cleanly, add memory
range attribute handling for both kernel and user-space
consumers.

pmap.c
Remove obsolete P6 MTRR-related code.

i686_mem.c
Map generic memory-range attribute interface to the P6 MTRR
model.


# 45370 06-Apr-1999 alc

Two changes to pmap_remove_all:

1. Switch to pmap_TLB_invalidate from invltlb, eliminating a full TLB
flush where a single-page flush suffices. (Also, this eliminates some
unnecessary IPIs.)

2. Use "loadandclear" to update the pte, eliminating a race condition
on SMPs.

Change #2 should be committed to -STABLE.


# 45347 05-Apr-1999 julian

Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>


# 45252 02-Apr-1999 alc

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


# 44704 13-Mar-1999 alc

pmap_qenter/pmap_qremove:
Use the pmap_kenter/pmap_kremove inline functions
instead of duplicating them.

pmap_remove_all:
Eliminate an unused (but initialized) variable.

pmap_ts_reference:
Change the implementation. The new implementation is much smaller
and simpler, but functionally identical. (Reviewed by
"John S. Dyson" <dyson@iquest.net>.)


# 44470 05-Mar-1999 alc

Fix an SMP-only TLB invalidation bug. Specifically, disable
a TLB invalidation optimization that won't work given the
limitations of our current SMP support.

This patch should be applied to -stable ASAP.

Thanks to John Capo <jc@irbs.com>,
Steve Kargl <sgk@troutmask.apl.washington.edu>, and
Chuck Robey <chuckr@mat.net>
for testing.


# 44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


# 43314 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 43138 24-Jan-1999 dillon

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


# 42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 42542 11-Jan-1999 eivind

Silence warnings by removing unused convenience function and
globalizing debugging functions.


# 42448 09-Jan-1999 dt

Oops --<, replace 1.216 with a version that actually check pv_entries (and
was tested for month or two in production).

Noticed by: Stephen McKay

Stephen also suggested to remove the complication at all. I don't do it as
it would be backout of a large part of 1.190 (from 1998/03/16)...


# 42396 08-Jan-1999 luoqi

Allocate kernel page table object (kptobj) before any kmem_alloc calls.
On a system with a large amount of ram (e.g. 2G), allocation of per-page
data structures (512K physical pages) could easily bust the initial kernel
page table (36M), and growth of kernel page table requires kptobj.


# 42381 07-Jan-1999 dt

Make pmap_ts_referenced check more than 1 pv_entry. (One should be carefull
when move elements to the tail of a list in a loop...)


# 41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


# 41370 26-Nov-1998 tegge

Don't forget to update the pmap associated with aio daemons when adding
new page directory entries for a growing kernel virtual address space.


# 41318 24-Nov-1998 eivind

Move the declaration of PPro_vmtrr from the header file to pmap.c,
replacing the one in the header file with a definition. This makes it
easier to work with tools that grok ANSI C only.


# 40998 08-Nov-1998 msmith

Enable 686 class optimisations for all 686-class processors, not just the
Pentium Pro. This resolves the "Dog slow SMP" issue for Pentium II
systems.


# 40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


# 40545 21-Oct-1998 dg

Decrement the now unused page table page's wire_count prior to freeing it.
It will soon be required that pages have a zero wire_count when being
freed.


# 38893 06-Sep-1998 tegge

Don't go below the low water mark of free pages due to optional prefaulting
of pages.
PR: 2431


# 38807 04-Sep-1998 ache

PAGE_WAKEUP -> vm_page_wakeup


# 38488 23-Aug-1998 bde

Fixed printf format errors.


# 38349 15-Aug-1998 bde

pmap.c:
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.

mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.

vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.


# 37561 11-Jul-1998 bde

Fixed printf format errors.


# 37557 11-Jul-1998 phk

Don't disable pmap_setdevram() which isn't called, but which could be,
but instead disable pmap_setvidram() which is called, but probably
shouldn't be.

PR: 7227, 7240


# 37555 11-Jul-1998 bde

Fixed printf format errors.


# 36275 21-May-1998 dyson

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


# 36179 19-May-1998 phk

Make the size of the msgbuf (dmesg) a "normal" option.


# 36169 18-May-1998 tegge

Back out part of revision 1.198 commit (clearing kernel stack pages).
By request from David Greenman <dg@root.com>


# 36125 17-May-1998 tegge

For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.


# 36121 17-May-1998 tegge

Clear kernel stack pages before usage.
Correct panic message in pmap_zero_page (s/CMAP /CMAP2 /).


# 36051 15-May-1998 dyson

Disable the auto-Write Combining setup for the pmap code. This
worked on a couple of machines of mine, but appears to cause problems
on others.


# 35940 11-May-1998 dyson

Change some tests from CPU_CLASS686 to CPU_686 as appropriate, and
also correct a serious ommision that would cause process faulures
due to forgetting an invltlb type operatino. This was just a
transcription problem.


# 35933 11-May-1998 dyson

Support better performance with P6 architectures and in SMP
mode. Unnecessary TLB flushes removed. More efficient
page zeroing on P6 (modify page only if non-zero.)


# 35932 10-May-1998 dyson

Attempt to set write combining mode for graphics devices.


# 35296 19-Apr-1998 bde

Support compiling with gcc -pedantic (don't use a bogus, null cast).


# 35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


# 35074 06-Apr-1998 peter

Bogus casts


# 34611 15-Mar-1998 dyson

Some VM improvements, including elimination of alot of Sig-11
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)

pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)

vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.

vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.

vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)

vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)

10) Modify FFS to use vtruncbuf.

vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)

vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.

vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)

vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.


# 34440 09-Mar-1998 eivind

Turn "PMAP_SHPGPERPROC" into a new-style option, add it to LINT, and
document it there.


# 34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


# 33936 01-Mar-1998 dyson

1) Use a more consistent page wait methodology.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."


# 33282 12-Feb-1998 bde

Fixed initialization of the 4MB page. Kernels larger than about 2.75MB
(from _btext to _end) crashed in pmap_bootstrap(). Smaller kernels
worked accidentally.


# 33221 10-Feb-1998 eivind

Fix warning after previous staticization.


# 33181 09-Feb-1998 eivind

Staticize.


# 33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


# 33109 05-Feb-1998 dyson

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


# 33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


# 33056 03-Feb-1998 bde

Converted DISABLE_PSE to a new-style option.

Fixed some formatting in options.i386.


# 32937 31-Jan-1998 dyson

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


# 32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


# 31934 22-Dec-1997 dyson

Correct my previous fix for the UPAGES problem.


# 31930 21-Dec-1997 dyson

Hopefully fix the problem with the TLB not being updated correctly.
Problem tracked down by bde@freebsd.org, but this is an attempted
efficient fix.


# 31709 14-Dec-1997 dyson

After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.


# 31321 20-Nov-1997 bde

Moved some extern declarations to header files (unused ones to /dev/null).


# 31030 07-Nov-1997 tegge

Use UPAGES when setting up private pages for SMP (which includes idle stack).


# 31017 07-Nov-1997 phk

Rename some local variables to avoid shadowing other local variables.

Found by: -Wshadow


# 31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 30813 28-Oct-1997 bde

Removed unused #includes.


# 30754 26-Oct-1997 dyson

Check to see if the pv_limits are initialized before checking.


# 30732 26-Oct-1997 dyson

Change the initial amount of memory allocated for pv_entries to be proportional
to the amount of system memory. Also, clean-up some of the new pv_entry
mgmt code.


# 30702 25-Oct-1997 dyson

Somehow an error crept in during the previous commit.


# 30701 25-Oct-1997 dyson

Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added. This helps large systems with lots of processes.


# 30700 24-Oct-1997 dyson

Decrease the initial allocation for the zone allocations.


# 30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# 29655 21-Sep-1997 dyson

Add support for more than 1 page of idle process stack on SMP systems.


# 29174 06-Sep-1997 dyson

Fix an intermittent problem during SMP code operation. Not all of the
idle page table directories for all of the processors was being updated
during kernel grow operations. The problem appears to be gone now.


# 28808 26-Aug-1997 peter

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


# 28747 25-Aug-1997 bde

Finished (?) support for DISABLE_PSE option. 2-3MB of kernel vm was sometimes
wasted.

Fixed type mismatches for functions with vm_prot_t's as args. vm_prot_t
is u_char, so the prototypes should have used promoteof(u_char) to match
the old-style function definitions. They use just vm_prot_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change vm_prot_t to u_int, to optimize for time
instead of space.

Removed a stale comment.


# 27950 07-Aug-1997 dyson

Fix the DDB breakpoint code when using the 4MB page support.


# 27947 07-Aug-1997 dyson

More vm_zone cleanup. The sysctl now accounts for items better, and
counts the number of allocations.


# 27923 05-Aug-1997 dyson

Another attempt at cleaning up the new memory allocator.


# 27922 05-Aug-1997 dyson

Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.


# 27904 04-Aug-1997 dyson

Modify pmap to use our new memory allocator.


# 27903 04-Aug-1997 dyson

Slightly reorder some operations so that the main processor gets global
mappings early on.


# 27902 04-Aug-1997 dyson

Remove the PMAP_PVLIST conditionals in pmap.*, and another unneeded define.


# 27566 20-Jul-1997 dyson

Fix a crash that has manifest itself while running X after the 4MB
page upgrades.


# 27535 20-Jul-1997 bde

Removed unused #includes.


# 27484 17-Jul-1997 dyson

Hopefully fix a few problems that could cause hangs in SMP mode.
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.

We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...


# 27464 17-Jul-1997 dyson

Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.


# 26944 25-Jun-1997 tegge

Allow kernel configuration file to override PMAP_SHPGPERPROC. The default
value (200) is too low in some environments, causing a fatal
"panic: get_pv_entry: cannot get a pv_entry_t". The same panic might
still occur due to temporary shortage of free physical memory
(cf. PR i386/2431).


# 26812 22-Jun-1997 peter

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


# 26270 29-May-1997 fsmp

Code such as apic_base[APIC_ID] converted to lapic__id

Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


# 26267 29-May-1997 peter

remove no longer needed opt_smp.h includes


# 25194 27-Apr-1997 peter

Whoops.. We forgot to turn off the 4MB Virtual==Physical mapping at address
zero from bootstrap in the non-SMP case.

Noticed by: bde


# 25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 24851 13-Apr-1997 dyson

The pmap code was too generous in the allocation of kva space for
the pv entries. This problem has become obvious due to the increase
in the size of the pv entries. We need to create a more intelligent
policy for pv entry management eventually.
Submitted by: David Greenman <dg@freebsd.org>


# 24848 12-Apr-1997 dyson

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


# 24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 22129 30-Jan-1997 dg

Removed unnecessary PG_N flag from device memory mappings. This is handled
by the CPU/chipset already and was apparantly triggering a hardware bug that
causes strange parity errors.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21568 11-Jan-1997 dyson

When we changed pmap_protect to support adding the writeable
attribute to a page range, we forgot to set the PG_WRITEABLE
flag in the vm_page_t. This fixes that problem.


# 21529 11-Jan-1997 dyson

Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.) Upper level recoded to use
pmap_ts_referenced.


# 20997 29-Dec-1996 dyson

Allow pmap_protect to increase permissions. This mod can eliminate
the need for unnecessary vm_faults.
Submitted by: Alan Cox <alc@cs.rice.edu>


# 19621 11-Nov-1996 dyson

Support the PG_G flag on Pentium-Pro processors. This pretty
much eliminates the unnecessary unmapping of the kernel during
context switches and during invtlb...


# 19503 07-Nov-1996 joerg

Fix the message buffer mapping. This actually allows to increase
the message buffer size in <sys/msgbuf.h>.

Reviewed by: davidg,joerg
Submitted by: bde


# 19346 03-Nov-1996 dyson

Fix a problem with running down processes that have left wired
mappings with mlock. This problem only occurred because of the
quick unmap code not respecting the wired-ness of pages in the
process. In the future, we need to eliminate the dependency
intrinsic to the design of the code that wired pages actually
be mapped. It is kind-of bogus not to have wired pages mapped,
but it is also a weakness for the code to fall flat because
of a missing page.

This show fix a problem that Tor Egge has been having, and also
should be included into 2.2-RELEASE.


# 19119 23-Oct-1996 dyson

Account for the UPAGES in the same way as before moving the MD code
from vm_glue into pmap.c. Now RSS should appear to be the same as before.


# 18937 15-Oct-1996 dyson

Move much of the machine dependent code from vm_glue.c into
pmap.c. Along with the improved organization, small proc fork
performance is now about 5%-10% faster.


# 18904 12-Oct-1996 dyson

Minor optimization for final rundown of a pmap.


# 18897 12-Oct-1996 dyson

Performance optimizations. One of which was meant to go in before the
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.


# 18896 12-Oct-1996 bde

Cleaned up:
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.


# 18842 09-Oct-1996 bde

Put I*86_CPU defines in opt_cpu.h.


# 18548 28-Sep-1996 dyson

Essentially rename pmap_update to be invltlb. It is a very machine
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.

Additionally, fix the tlb management code for 386 machines.


# 18538 28-Sep-1996 bde

Restored my change in rev.1.119 which was clobbered by the previous commit.


# 18528 28-Sep-1996 dyson

Move pmap_update_1pg to cpufunc.h. Additionally,
use the invlpg opcode instead of the nasty looking .byte directives.
There are some other minor micro-level code improvements to pmap.c


# 18275 13-Sep-1996 bde

Made debugging code (pmap_pvdump()) compile again so that I can test LINT.
I don't know if it actually works.


# 18260 12-Sep-1996 dyson

Primarily a fix so that pages are properly tracked for being
modified. Pages that are removed by the pageout daemon were
the worst affected. Additionally, numerous minor cleanups,
including better handling of busy page table pages. This
commit fixes the worst of the pmap problems recently introduced.


# 18239 11-Sep-1996 dyson

A minor fix to the new pmap code. This might not fix the global problems
with the last major pmap commits.


# 18169 08-Sep-1996 dyson

Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)


# 18163 08-Sep-1996 dyson

Improve the scalability of certain pmap operations.


# 17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


# 17329 29-Jul-1996 dyson

Fix a problem with a DEBUG section of code.


# 17325 29-Jul-1996 dyson

Fix an error in statement order in pmap_remove_pages, remove the pmap
pte hint (for now), and general code cleanup.


# 17321 28-Jul-1996 dyson

Fix a problem that pmap update was not being done for kernel_pmap. Also
remove some (currently) gratuitious tests for PG_V... This bug could
have caused various anomolous (temporary) behavior.


# 17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


# 17121 12-Jul-1996 bde

Removed "optimization" using gcc's builtin memcpy instead of bcopy.
There is little difference now since the amount copied is large,
and bcopy will become much faster on some machines.


# 16747 26-Jun-1996 dyson

When page table pages were removed from process address space, the
resident page stats were not being decremented. This mode corrects
that problem.


# 16680 24-Jun-1996 dyson

Limit the scan for preloading pte's to the end of an object.


# 16471 17-Jun-1996 bde

Removed unused #includes of <i386/isa/icu.h> and <i386/isa/icu.h>. icu.h
is only used by the icu support modules and by a few drivers that know
too much about the icu (most only use it to convert `n' to `IRQn'). isa.h
is only used by ioconf.c and by a few drivers that know too much about
isa addresses (a few have to, because config is deficient).


# 16415 17-Jun-1996 dyson

Several bugfixes/improvements:
1) Make it much less likely to miss a wakeup in vm_page_free_wakeup
2) Create a new entry point into pmap: pmap_ts_referenced, eliminates
the need to scan the pv lists twice in many cases. Perhaps there
is alot more to do here to work on minimizing pv list manipulation
3) Minor improvements to vm_pageout including the use of pmap_ts_ref.
4) Major changes and code improvement to pmap. This code has had
several serious bugs in page table page manipulation. In order
to simplify the problem, and hopefully solve it for once and all,
page table pages are no longer "managed" with the pv list stuff.
Page table pages are only (mapped and held/wired) or
(free and unused) now. Page table pages are never inactive,
active or cached. These changes have probably fixed the
hold count problems, but if they haven't, then the code is
simpler anyway for future bugfixing.
5) The pmap code has been sorely in need of re-organization, and I
have taken a first (of probably many) steps. Please tell me
if you have any ideas.


# 16324 12-Jun-1996 dyson

Fix a very significant cnt.v_wire_count leak in vm_page.c, and some
minor leaks in pmap.c. Bruce Evans made me aware of this problem.


# 16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


# 16197 08-Jun-1996 dyson

Adjust the threshold for blocking on movement of pages from the cache
queue in vm_fault.

Move the PG_BUSY in vm_fault to the correct place.

Remove redundant/unnecessary code in pmap.c.

Properly block on rundown of page table pages, if they are busy.

I think that the VM system is in pretty good shape now, and the following
individuals (among others, in no particular order) have helped with this
recent bunch of bugs, thanks! If I left anyone out, I apologize!

Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard,
Marc Fournier.


# 16166 07-Jun-1996 dyson

Fix a bug in the pmap_object_init_pt routine that pages aren't taken
from the cache queue before being mapped into the process.


# 16136 05-Jun-1996 dyson

I missed a case of the page table page dirty-bit fix.


# 16122 05-Jun-1996 dyson

Keep page-table pages from ever being sensed as dirty. This should fix
some problems with the page-table page management code, since it can't
deal with the notion of page-table pages being paged out or in transit.
Also, clean up some stylistic issues per some suggestions from
Stephen McKay.


# 16079 02-Jun-1996 dyson

Don't carry the modified or referenced bits through to the child
process during pmap_copy. This minimizes unnecessary swapping or creation of
swap space. If there is a hold_count flaw for page-table
pages, clear the page before freeing it to lessen the chance of a system
crash -- this is a robustness thing only, NOT a fix.


# 16057 01-Jun-1996 dyson

Fix the problem with pmap_copy that breaks X in small memory machines. Also
close some windows that are opened up by page table allocations. The
prefaulting code no longer uses hold counts, but now uses the busy
flag for synchronization.


# 16026 30-May-1996 dyson

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


# 15977 29-May-1996 dyson

The wrong address (pindex) was being used for the page table directory. No
negative side effects right now, but just a clean-up.


# 15868 22-May-1996 peter

Fix harmless warning.. pmap_nw_modified was not having it's arg
cast to pt_entry_t like the others inside the DIAGNOSTIC code.


# 15858 22-May-1996 dyson

A serious error in pmap.c(pmap_remove) is corrected by this. When
comparing the PTD pointers, they needed to be masked by PG_FRAME, and
they weren't. Also, the "improved" non-386 code wasn't really an
improvement, so I simplified and fixed the code. This might have
caused some of the panics caused by the VM megacommit.


# 15832 20-May-1996 dyson

To quote Stephen McKay: pmap_copy is a complex NOP at this moment :-).

With this fix from Stephen, we are getting the target fork performance
that I have been trying to attain: P5-166, before the mega-commit: 700-800usecs,
after: 600usecs, with Stephen's fix: 500usecs!!! Also, this could be the
solution of some strange panic problems...
Reviewed by: dyson@freebsd.org
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


# 15819 19-May-1996 dyson

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


# 15809 18-May-1996 dyson

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


# 15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


# 15565 02-May-1996 phk

Move atdevbase out of locore.s and into machdep.c
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE

This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.

(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)


# 15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 15340 22-Apr-1996 dyson

This fixes a troubling oversight in some of the pmap code enhancements.
One of the manifiestations of the problem includes the -4 RSS problem
in ps.

Reviewed by: dyson
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


# 15088 07-Apr-1996 dyson

Major cleanups for the pmap code.


# 14967 31-Mar-1996 dg

Change if/goto into a while loop.


# 14868 28-Mar-1996 dyson

Remove a now unnecessary prototype from pmap.c. Also remove now
unnecessary vm_fault's of page table pages in trap.c.


# 14867 28-Mar-1996 dyson

Significant code cleanup, and some performance improvement. Also,
mlock will now work properly without killing the system.


# 14607 12-Mar-1996 dyson

Make sure that we pmap_update AFTER modifying the page table entries.
The P6 can do a serious job of reordering code, and our stuff could
execute incorrectly.


# 14527 11-Mar-1996 hsu

For Lite2: proc LIST changes.
Reviewed by: david & bde


# 14470 10-Mar-1996 dyson

Improved efficiency in pmap_remove, and also remove some of the pmap_update
optimizations that were probably incorrect.


# 14433 09-Mar-1996 dyson

Correct some new and older lurking bugs. Hold count wasn't being
handled correctly. Fix some incorrect code that was included
to improve performance. Significantly simplify the pmap_use_pt and
pmap_unuse_pt subroutines. Add some more diagnostic code.


# 14245 25-Feb-1996 dyson

Re-insert a missing pmap_remove operation.


# 14243 25-Feb-1996 dyson

Fix a problem with tracking the modified bit. Eliminate the
ugly inline-asm code, and speed up the page-table-page tracking.


# 13908 04-Feb-1996 dg

Rewrote cpu_fork so that it doesn't use pmap_activate, and removed
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.


# 13495 19-Jan-1996 peter

Some trivial fixes to get it to compile again, plus some new lint:
- cpuclass should be cpu_class
- CPUCLASS_I386 should be CPUCLASS_386
(^^ those only show up if you compile for i386)
- two missing prototypes on new functions
- one missing static


# 13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 12978 22-Dec-1995 bde

Staticized code that was hidden by `#ifdef DEBUG'.


# 12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


# 12850 14-Dec-1995 bde

Added a prototype. Merged prototype lists.


# 12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# 12722 10-Dec-1995 phk

Staticize and cleanup.
remove a TON of #includes from machdep.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12607 03-Dec-1995 bde

Completed function declarations and/or added prototypes.


# 12417 20-Nov-1995 phk

Remove unused vars.


# 11703 23-Oct-1995 dg

Remove PG_W bit setting in some cases where it should not be set.

Submitted by: John Dyson <dyson>


# 11693 22-Oct-1995 dg

More improvements to the logic for modify-bit checking. Removed
pmap_prefault() code as we don't plan to use it at this point in time.

Submitted by: John Dyson <dyson>


# 11642 22-Oct-1995 dg

Simplified some expressions.


# 10782 15-Sep-1995 dg

1) Killed 'BSDVM_COMPAT'.
2) Killed i386pagesperpage as it is not used by anything.
3) Fixed benign miscalculations in pmap_bootstrap().
4) Moved allocation of ISA DMA memory to machdep.c.
5) Removed bogus vm_map_find()'s in pmap_init() - the entire range was
already allocated kmem_init().
6) Added some comments.

virual_avail is still miscalculated NKPT*NBPG too large, but in order to
fix this properly requires moving the variable initialization into locore.s.
Some other day.


# 9744 28-Jul-1995 dg

Fixed bug I introduced with the memory-size code rewrite that broke
floppy DMA buffers...use avail_start not "first". Removed duplicate
(and wrong) declaration of phys_avail[].

Submitted by: Bruce Evans, but fixed differently by me.


# 9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


# 7693 09-Apr-1995 dg

Cosmetic changes.


# 7490 30-Mar-1995 dg

Made pmap_testbit a static function.


# 7402 26-Mar-1995 dg

Changed pmap_changebit() into a static function as it always should
have been.

Submitted by: John Dyson


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 6977 10-Mar-1995 dg

Removed unnecessary routines vm_get_pmap() and vm_put_pmap().
kmem_alloc() returns zero filled memory, so no need to explicitly
bzero() it.


# 6807 01-Mar-1995 dg

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


# 6734 26-Feb-1995 bde

Replace all remaining instances of `i386/include' by `machine' and fix
nearby #include inconsistencies.


# 6423 15-Feb-1995 dg

Killed the pmap_use_pt and pmap_unuse_pt prototypes as they are now in
machine/pmap.h.


# 6126 02-Feb-1995 dg

Mostly cosmetic changes. Use KERNBASE instead of UPT_MAX_ADDRESS in
some comparisons as it is more correct (we want the kernel page tables
included).
Reorganized some of the expressions for efficiency.
Fixed the new pmap_prefault() routine - it would sometimes pick up the
wrong page if the page in the shadow was present but the page in object
was paged out. The routine remains unused and commented out, however.
Explicitly free zero reference count page tables (rather than waiting
for the pagedaemon to do it).

Submitted by: John Dyson


# 5943 26-Jan-1995 dg

Fix from Doug Rabson for a panic related to not initializing the kernel's
PTD.

Submitted by: John Dyson


# 5916 25-Jan-1995 dg

Comment out pmap_prefault for the time being (perhaps until after 2.1).
The object_init_pt routine is still enabled and used, however, and this
is where most of the 'pre-faulting' performance improvement comes from.


# 5914 25-Jan-1995 dg

Make sure that the pages being 'pre-faulted' are currently on a queue.


# 5910 25-Jan-1995 dg

Be a bit less fast and loose about setting non-cacheablity of pages.


# 5837 24-Jan-1995 dg

Changed buffer allocation policy (machdep.c)
Moved various pmap 'bit' test/set functions back into real functions; gcc
generates better code at the expense of more of it. (pmap.c)
Fixed a deadlock problem with pv entry allocations (pmap.c)
Added a new, optional function 'pmap_prefault' that does clustered page
table preloading (pmap.c)
Changed the way that page tables are held onto (trap.c).

Submitted by: John Dyson


# 5642 15-Jan-1995 dg

Fixed some page table reference count problems; these changes may not be
complete, but should be closer to correct than before.


# 5580 14-Jan-1995 dg

Add missing object_lock/unlock.


# 5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 5153 18-Dec-1994 dg

Move page_unhold's in pmap_object_init_pt down one line to gard against
a potential race condition.


# 5144 18-Dec-1994 dg

Check for PG_FAKE too in pmap_object_init_pt.


# 3436 08-Oct-1994 phk

db_disasm.c: Unused var zapped.
pmap.c: tons of unused vars zapped, various other warnings silenced.
trap.c: unused vars zapped.
vm_machdep.c: A wrong argument, which by chance did the right thing, was
corrected.


# 2826 16-Sep-1994 dg

Removed inclusion of pio.h and cpufunc.h (cpufunc.h is included from
systm.h). Merged functionality of pio.h into cpufunc.h. Cleaned up some
related code.


# 2492 04-Sep-1994 dg

Added pmap_mapdev() function to map device memory.


# 2455 02-Sep-1994 dg

Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update().
Made pmap_update an inline assembly function.


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 2056 13-Aug-1994 wollman

Change all #includes to follow the current Berkeley style. Some of these
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.

This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:

rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make


# 1896 07-Aug-1994 dg

Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that
are no longer needed because of this.


# 1895 07-Aug-1994 dg

Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by: John Dyson


# 1890 06-Aug-1994 dg

Fixed various prototype problems with the pmap functions and the subsequent
problems that fixing them caused.


# 1887 06-Aug-1994 dg

Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by: John Dyson


# 1825 03-Aug-1994 dg

Merged in post-1.1.5 work done by myself and John Dyson. This includes:

me:
1) TLB flush optimization that effectively eliminates half of all of the
TLB flushes. This works by only flushing the TLB when a page is "present"
in memory (i.e. the valid bit is set in the page table entry). See section
5.3.5 of the Intel 386 Programmer's Reference Manual.
2) The handling of "CMAP" has been improved to catch attempts at multiple
simultaneous use.

John:
1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into
the kernel. This is for future optimizations and support for the upcoming
merged VM/buffer cache.

Reviewed by: John Dyson


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1440 02-May-1994 dg

Removed some tlbflush optimizations as some of them were bogus and lead
to some strange behavior.


# 1379 20-Apr-1994 dg

Bug fixes and performance improvements from John Dyson and myself:

1) check va before clearing the page clean flag. Not doing so was
causing the vnode pager error 5 messages when paging from
NFS. (pmap.c)
2) put back interrupt protection in idle_loop. Bruce didn't think
it was necessary, John insists that it is (and I agree). (swtch.s)
3) various improvements to the clustering code (vm_machdep.c). It's
now enabled/used by default.
4) bad disk blocks are now handled properly when doing clustered IOs.
(wd.c, vm_machdep.c)
5) bogus bad block handling fixed in wd.c.
6) algorithm improvements to the pageout/pagescan daemons. It's amazing
how well 4MB machines work now.


# 1362 14-Apr-1994 dg

Changes from John Dyson and myself:

1) Removed all instances of disable_intr()/enable_intr() and changed
them back to splimp/splx. The previous method was done to improve
the performance, but Bruces recent changes to inline spl* have
made this unnecessary.
2) Cleaned up vm_machdep.c considerably. Probably fixed a few bugs, too.
3) Added a new mechanism for collecting page statistics - now done by
a new system process "pagescan". Previously this was done by the
pageout daemon, but this proved to be impractical.
4) Improved the page usage statistics gathering mechanism - performance is
much improved in small memory machines.
5) Modified mbuf.h to enable the support for an external free routine when
using mbuf clusters. Added appropriate glue in various places to
allow this to work.
6) Adapted a suggested change to the NFS code from Yuval Yurom to take
advantage of #5.
7) Added fault/swap statistics support.


# 1312 30-Mar-1994 dg

New routine "pmap_kenter", designed to take advantage of the special
case of the kernel pmap.


# 1262 14-Mar-1994 dg

Performance improvements from John Dyson.

1) A new mechanism has been added to prevent pages from being paged
out called "vm_page_hold". Similar to vm_page_wire, but
much lower overhead.
2) Scheduling algorithm has been changed to improve interactive
performance.
3) Paging algorithm improved.
4) Some vnode and swap pager bugs fixed.


# 1246 07-Mar-1994 dg

1) "Pre-faulting" in of pages into process address space
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.

(process startup time using in-memory segments *much* faster)

2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.

(generally more efficient pte management)

3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)

(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)

4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.

(less likely to pageout a recently used page)

5) Slight improvement to the page table page trap efficiency.

(generally faster system VM fault performance)

6) Defer creation of unnamed anonymous regions pager until needed.

(speeds up shared memory bss creation)

7) Remove possible deadlock from swap_pager initialization.

8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.

9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.

John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com


# 1151 13-Feb-1994 dg

Fixed bug in handling of COW - the original code was bogus and it was
only accidental that it worked. Also, don't cache non-managed pages.


# 1139 10-Feb-1994 dg

Patch from John Dyson:

a pv chain was being traversed while interrupts were
fully enabled in pmap_remove_all ... this is bogus, and
has been fixed in pmap.c. (sorry for adding the splimp)


# 1124 08-Feb-1994 dg

Fixes from John Dyson to fix out-of-memory hangs and other problems (such
as increased swap space usage) related to (incorrectly) paging out the
page tables.


# 1055 31-Jan-1994 dg

Added four pattern memory test routine that is done at startup.


# 1045 31-Jan-1994 dg

VM system performance improvements from John Dyson and myself. The
following is a summary:

1) increased object cache back up to a more reasonable value.
2) removed old & bogus cruft from machdep.c (clearseg, copyseg,
physcopyseg, etc).
3) inlined many functions in pmap.c
4) changed "load_cr3(rcr3())" into tlbflush() and made tlbflush inline
assembly.
5) changed the way that modified pages are tracked - now vm_page struct
is kept updated directly - no more scanning page tables.
6) removed lots of unnecessary spl's
7) removed old unused functions from pmap.c
8) removed all use of page_size, page_shift, page_mask variables - replaced
with PAGE_ constants.
9) moved trunc/round_page, atop, ptoa, out of vm_param.h and into i386/
include/param.h, and optimized them.
10) numerous changes to sys/vm/ swap_pager, vnode_pager, pageout, fault
code to improve performance. LRU algorithm modified to be more
effective, read ahead/behind values tuned for better performance,
etc, etc...


# 1028 27-Jan-1994 dg

Made pmap_is_managed a static inline function.


# 981 17-Jan-1994 dg

Improvements mostly from John Dyson, with a little bit from me.

* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system


# 974 14-Jan-1994 dg

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


# 879 18-Dec-1993 wollman

Make everything compile with -Wtraditional. Make it easier to distribute
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.

NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.


# 856 13-Dec-1993 dg

added some panics to catch the condition where pmap_pte returns null
- indicating that the page table page is non-resident.


# 798 24-Nov-1993 wollman

Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and
add same (sans -Werror) to Makefile for future compilations.


# 757 13-Nov-1993 dg

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.


# 608 15-Oct-1993 rgrimes

genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.

locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.

Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!

Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.

Use if NNPX > 0 instead of ifdef NPX for floating point code.

machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.

It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.

pmap.h
Remove rcsid, don't want them in the kernel files!

Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.

Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.

trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)

vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Replace several 0xFE00000 constants with KERNBASE


# 589 12-Oct-1993 rgrimes

KPTDI_LAST renamed to KPTDI


# 587 12-Oct-1993 rgrimes

Eliminate definition of I386_PAGE_SIZE and use NBPG instead
Replace 0xFE000000 constants with KERNBASE
Use new definition NKPDE in place of a first-last+1 calculation.


# 528 30-Sep-1993 rgrimes

This is a fix for the 32K DMA buffer region that was not accounted for,
it relocates it to be after the BIOS memory hole instead of right below
the 640K limit.
THANK YOU CHRIS!!!

From: <cgd@postgres.Berkeley.EDU>
Date: Wed, 29 Sep 93 18:49:58 -0700
basically, reserve a new 32k space right after firstaddr,
and put the buffer space there...

the diffs are below, and are in ~cgd/sys/i386/i386 (in machdep.c)
on freefall. i obviously can't test them, so if some of you would
look the diffs over and try them out...


# 424 08-Sep-1993 rgrimes

Changed the pg("ptdi> %x") to a printf and then a panic, since we are
going to panic shortly after this anyway. Destroys less state, and
keeps the machine from waiting for someone to smash the return key
a few times before it panics!


# 200 27-Jul-1993 dg

* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading,
profiling, and various protection checks that cause security holes
and system crashes.
* Changed min/max/bcmp/ffs/strlen to be static inline functions
- included from cpufunc.h in via systm.h. This change
improves performance in many parts of the kernel - up to 5% in the
networking layer alone. Note that this requires systm.h to be included
in any file that uses these functions otherwise it won't be able to
find them during the load.
* Fixed incorrect call to splx() in if_is.c
* Fixed bogus variable assignment to splx() in if_ed.c


# 5 12-Jun-1993 rgrimes

This commit was generated by cvs2svn to compensate for changes in r4,
which included commits to RCS files with non-trunk default branches.


# 4 12-Jun-1993 rgrimes

Initial import, 0.1 + pk 0.2.4-B1