History log of /freebsd-9.3-release/sys/ia64/ia64/pmap.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 251963 18-Jun-2013 marius

MFC: r238190

Implement ia64_physmem_alloc() and use it consistently to get memory
before VM has been initialized.


# 251897 18-Jun-2013 scottl

Merge the second part of the unmapped I/O changes. This enables the
infrastructure in the block layer and UFS filesystem as well as a few
drivers. The list of MFC revisions is long, so I won't quote changelogs.

r248508,248510,248511,248512,248514,248515,248516,248517,248518,
248519,248520,248521,248550,248568,248789,248790,249032,250936

Submitted by: kib
Approved by: kib
Obtained from: Netflix


# 248814 28-Mar-2013 kib

MFC r248280:
Add pmap function pmap_copy_pages().


# 248085 09-Mar-2013 marius

MFC: r227309 (partial)

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 240238 08-Sep-2012 kib

MFC r239065:
Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.


# 226319 12-Oct-2011 kib

Handle page dirty mask with atomics.

MFC r225838:
Use the explicitly-sized types for the dirty and valid masks.

MFC r225840:
Use the trick of performing the atomic operation on the contained aligned
word to handle the dirty mask updates in vm_page_clear_dirty_mask().

MFC 225841
Remove locking of the vm page queues from several pmaps.

MFC r225843:
Fix grammar.

MFC r225856:
Style nit.

Approved by: re (bz)


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# 224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


# 224663 05-Aug-2011 marcel

Follow-up commit: refactor pmap_kextract() to make it easier to
catch and debug issues like the one fixed in the previous commit:
Replace all return statements with goto statements so that we end
up at a single place with a value for the physical address.
Print a message for all unknown KVA addresses.

Approved by: re (blanket)


# 224662 05-Aug-2011 marcel

Remove stray semicolon in pmap_kextract() that turned the conditional
"return (0)" into an unconditional one and as such broke PBVM address
queries -- such as during kernel core dumps.

Approved by: re (blanket)


# 224116 16-Jul-2011 marcel

Don't assume pmap_mapdev() gets called only for memory mapped I/O
addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev()
rather excessively (number of calls) just to get a valid KVA. To
that end, have pmap_mapdev():
1. cache the last result so that we don't waste time for multiple
consecutive invocations with the same PA/SZ.
2. find the memory descriptor that covers the PA and return NULL
if none was found or when the PA is for a common DRAM address.
3. Use either a region 6 or region 7 KVA, in accordance with the
memory attribute.


# 223873 08-Jul-2011 marcel

Implement basic support for memory attributes. At this time we only
distinguish between UC and WB memory so that we can map the page to
either a region 6 address (for UC) or a region 7 address (for WB).

This change is only now possible, because previously we would map
regions 6 and 7 with 256MB translations and on top of that had the
kernel mapped in region 7 using a wired translation. The introduction
of the PBVM moved the kernel into its own region and freed up region
7 and allowed us to revert to standard page-sized translations.

This commit inroduces pmap_page_to_va() that respects the attribute.


# 223732 02-Jul-2011 alc

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


# 223700 30-Jun-2011 marcel

Change the management of nested faults by switching to physical
addressing while reading or writing the trap frame. It's not
possible to guarantee that the one translation cache entry that
we depend on is not going to get purged by the CPU. We already
know that global shootdowns (ptc.g and/or ptc.ga) can (and will)
cause multiple TC entries to get purged and we initialize tried
to handle that by serializing kernel entry with these operations.
However, we need to serialize kernel exit as well.

But even if we can serialize, it appears that CPU threads within
a core can affect each other's TC entries beyond the global
shootdown. This would mean serializing any and all translatation
cache updates with the threads in a core with the kernel entry
and exit of any thread in that core. This is just too painful
and complicated.

Since we already properly coded for the 2 nested faults that we
can get, all we need to do is use those to obtain the physical
address of the trap frame, switch to physical mode and in that
way eliminate any further faults. The trap frame is already
aligned to 1KB boundaries to make sure we don't cross the page
boundary, this is safe to do.

We still need to serialize ptc.g or ptc.ga across CPUs because
the platform can only have 1 such operation outstanding at the
same time. We can now use a regular (spin) lock for this.

Also, it has been observed that we can get a nested TLB faults
for region 7 virtual addresses. This was unexpected. For now,
we enhance the nested TLB fault handler to deal with those as
well, but it needs to be understood.


# 223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


# 223170 17-Jun-2011 marcel

Properly serialize the global shootdown with the instruction
stream of the local processor. Also explicitly invalidate
the ALAT. This is done on the other CPUs in the coherence
domain by virtue of the ptc.ga instruction, but does not
apply to the local CPU.


# 222531 31-May-2011 nwhitehorn

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


# 221606 07-May-2011 marcel

In pmap_kextract(), return the physical address for PBVM virtual
addresses as well (incl. the PBVM page table).


# 221271 30-Apr-2011 marcel

Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.

While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.


# 219841 21-Mar-2011 marcel

Fix switching to physical mode as part of calling into EFI runtime
services or PAL procedures. The new implementation is based on
specific functions that are known to be called in certain scenarios
only. This in particular fixes the PAL call to obtain information
about translation registers. In general, the new implementation does
not bank on virtual addresses being direct-mapped and will work when
the kernel uses PBVM.

When new scenarios need to be supported, new functions are added if
the existing functions cannot be changed to handle the new scenario.
If a single generic implementation is possible, it will become clear
in due time.

While here, change bootinfo to a pointer type in anticipation of
future development.


# 219808 20-Mar-2011 marcel

Change region 4 to be part of the kernel. This serves 2 purposes:
1. The PBVM is in region 4, so if we want to make use of it, we
need region 4 freed up.
2. Region 4 and above cannot be represented by an off_t by virtue
of that type being signed. This is problematic for truss(1),
ktrace(1) and other such programs.


# 209085 11-Jun-2010 marcel

The ptc.g operation for the Mckinley and Madison processors has the
side-effect of purging more than the requested translation. While
this is not a problem in general, it invalidates the assumption made
during constructing the trapframe on entry into the kernel in SMP
configurations. The assumption is that only the first store to the
stack will possibly cause a TLB miss. Since the ptc.g purges the
translation caches of all CPUs in the coherency domain, a ptc.g
executed on one CPU can cause a purge on another CPU that is
currently running the critical code that saves the state to the
trapframe. This can cause an unexpected TLB miss and with interrupt
collection disabled this means an unexpected data nested TLB fault.

A data nested TLB fault will not save any context, nor provide a
way for software to determine what caused the TLB miss nor where
it occured. Careful construction of the kernel entry and exit code
allows us to handle a TLB miss in precisely orchastrated points
and thereby avoiding the need to wire the kernel stack, but the
unexpected TLB miss caused by the ptc.g instructution resulted in
an unrecoverable condition and resulting in machine checks.

The solution to this problem is to synchronize the kernel entry
on all CPUs with the use of the ptc.g instruction on a single CPU
by implementing a bare-bones readers-writer lock that allows N
readers (= N CPUs entering the kernel) and 1 writer (= execution
of the ptc.g instruction on some CPU). This solution wins over
a rendez-vous approach by not interrupting CPUs with an IPI.

This problem has not been observed on the Montecito.

PR: ia64/147772
MFC after: 6 days


# 209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


# 208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


# 208659 30-May-2010 alc

Simplify the inner loop of get_pv_entry(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty,
wait instead until the loop completes.


# 208646 29-May-2010 alc

Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.


# 208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


# 208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


# 208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


# 207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


# 207410 29-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 207373 29-Apr-2010 alc

MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it. Moreover, on ia64, don't set the page's dirty field
unless pmap_protect() is removing write access.


# 207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


# 205454 22-Mar-2010 marcel

o Remove the pmap argument to pmap_invalidate_all() as it's not used
other than in a potentially dangerous KASSERT.
o Hand-inline pmap_remove_page() as it's only called from 1 place and
the abstraction that pmap_remove_page() provides is not enough to
warrant the obfuscation. Eliminate the dangerous KASSERT in the
process.
o In pmap_remove_pte(), remove the KASSERT for pmap being the current
one as it's not safe in the face of CPU migration.


# 205435 22-Mar-2010 marcel

Drop the pmap argument to pmap_invalidate_page(). It's not used other
than in a KASSERT. The KASSERT is broken in that it's done outside the
critical section and as such isn't protected against CPU migration.
Improve pmap_invalidate_page() as follows:
o calculate vhpt_ofs inside the critical region for exactly the same
reason.
o calculate the tag outside the FOREACH loop, as it's loop-invariant.
This is more efficient.
o Replace the test and set with an atomic cmpset operation because we
are changing other CPU's VHPT tables and this avoids invalidating
after the entry got modified. Not necessarily a problem, but better
safe than sorry.


# 204182 21-Feb-2010 marcel

Remove pm_active from struct pmap as it serves no purpose.

MFC after: 1 week


# 203883 14-Feb-2010 marcel

Some code churn:
o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used
by calling either bus_space_map() or pmap_mapdev().
o Implement bus_space_map() in terms of pmap_mapdev() and implement
bus_space_unmap() in terms of pmap_unmapdev().
o Have ia64_pib hold the uncached virtual address of the processor interrupt
block throughout the kernel's life and access the elements of the PIB
through this structure pointer.

This is a non-functional change with the exception of using ia64_ld1() and
ia64_st8() to write to the PIB. We were still using assignments, for which
the compiler generates semaphore reads -- which cause undefined behaviour
for uncacheable memory. Note also that the memory barriers in ipi_send() are
critical for proper functioning.

With all the mapping of uncached memory done by pmap_mapdev(), we can keep
track of the translations and wire them in the CPU. This then eliminates
the need to reserve a whole region for uncached I/O and it eliminates
translation traps for device I/O accesses.


# 200207 07-Dec-2009 marcel

Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id
excluded, as it's used by MI code) and mode the sysctl variables from
pcpu_stats to pcpu_md.
Adjust all references accordingly.

While nearby, change the PCPU sysctl tree so that they match the CPU
device sysctl tree -- they are now children of a static node called
"machdep.cpu" and are named only with their cpu ID.


# 200200 06-Dec-2009 marcel

Allocate the VHPT for each CPU in cpu_mp_start(), rather than
allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU
without memory waste -- MAXCPU is now 32 for SMP kernels.

This change also eliminates the VHPT scaling based in the total
memory in the system. It's the workload that determines the best size
of the VHPT. The workload can be affected by the amount of memory,
but not necessarily. For example, there's no performance difference
between VHPT sizes of 256KB, 512KB and 1MB when building the LINT
kernel. This was observed with a system that has 8GB of memory.
By default the kernel will allocate a 1MB VHPT. The user can tune the
system with the "machdep.vhpt.log2size" tunable.


# 198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


# 195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 195625 11-Jul-2009 marcel

On exec(2), when loading the ELF image, pmap_enter_object() is
called to prefault pages. This is an obvious place for making
sure the I-cache is coherent. It was missing though. As such,
execution over NFS and ZFS file systems was failing. NFS was
fixed the wrong way (by flushing the D-cache as part of the
NFS code) in a previous commit. ZFS problems were encountered
after that and indicated that something else was wrong...

Approved by: re (kib)


# 192324 18-May-2009 marcel

Rename ia64_invalidate_icache() to ia64_sync_icache(). We're
not invalidating anything.


# 187381 18-Jan-2009 alc

Correct an error in revision 1.170 of this file. When get_pv_entry() is
forced to reclaim pv entries, the one pv entry that it returns should not
be freed.


# 179190 22-May-2008 marcel

Create the bucket mutexes with MTX_NOWITNESS. There's now a
hard limit of 512 pending mutexes in the witness code and
we can easily have 1 million bucket mutexes initialized before
witness is up and running. Bumping the limit from 512 to 1M
is not really an option here...


# 179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


# 178893 09-May-2008 alc

Add a stub for pmap_align_superpage() on machines that don't (yet)
implement pmap-level support for superpages.


# 178309 19-Apr-2008 marcel

Sanitize the malloc types: M_PMAP is not used in pmap.c, so don't
define it there. Don't use M_PMAP in mp_machdep.c; define M_SMP
instead.


# 177769 30-Mar-2008 marcel

Better implement I-cache invalidation. The previous implementation
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.

MFC after: 2 weeks


# 176286 14-Feb-2008 marcel

On Montecito processors, the instruction cache is in fact not
coherent with the data caches. Implement a quick fix to allow
us to boot on Montecito, while I'm working on a better fix in
the mean time.

Commit made on Montecito-based Itanium...


# 175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


# 175066 03-Jan-2008 imp

Use correct function name in panic message


# 175065 03-Jan-2008 imp

Fix obsolete comment. pmap_remove_all is the function we're in.


# 173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


# 173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 171721 04-Aug-2007 marcel

Replace "__asm __volatile()" by equivalent support functions from
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.

Approved by: re (blanket)


# 171663 30-Jul-2007 marcel

Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)


# 170402 07-Jun-2007 marcel

Eliminate pmap_install(), which was used to wrap pmap_switch() and
grab sched_lock. This would serialize calls to pmap_switch from
cpu_switch(). With the introduction of thread_lock, this is not
possible anymore, because thread_lock is not a single lock. It
varies. Secondly and most importantly, it's not needed at all. The
only requirement for pmap_switch() is that it's not preempted
while in the middle of updating the CPU and PCPU. In other words,
it's a critical region. No locking required.


# 170307 04-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 170026 27-May-2007 marcel

Have the processor defer all faults and exceptions for control
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.


# 169773 19-May-2007 marcel

Fix GCC warning: va = va += PAGE_SIZE contains pointless operation
va = va. Fix white space in nearby lines.


# 169760 19-May-2007 marcel

Add a level of indirection to the kernel PTE table. The old
scheme allowed for 1024 PTE pages, each containing 256 PTEs.
This yielded 2GB of KVA. This is not enough to boot a kernel
on a 16GB box and in general too low for a 64-bit machine.
By adding a level of indirection we now have 1024 2nd-level
directory pages, each capable of supporting 2GB of KVA. This
brings the grand total to 2TB of KVA.


# 169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 166860 21-Feb-2007 alc

Change pmap_protect() so that execute access can be removed without
simultaneously removing write access.


# 166631 11-Feb-2007 marcel

Now that the free page queue mutex is a sleep mutex, we cannot call
vm_page_alloc() from within a critical section in pmap_growkernel().
Since the need for a critical section may never have existed in the
first place, simply get rid of it.

Discussed with: alc@


# 164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


# 163603 22-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests.


# 160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


# 159971 27-Jun-2006 alc

Make several changes to pmap_enter_quick_locked():

1. Make the caller responsible for performing pmap_install(). This reduces
the number of times that pmap_install() is performed by
pmap_enter_object() from twice per page to twice overall.

2. Don't block if pmap_find_pte() is unable to allocate a PTE. If it did
block, then it might wind up mapping a cache page. Specifically, if
pmap_enter_quick_locked() slept when called from pmap_enter_object(), the
page daemon could change an active or inactive page into a cache page just
before it was to be mapped.

3. Bail out of pmap_enter_quick_locked() if pv entries aren't plentiful.
In other words, don't force the allocation of a pv entry if they aren't
readily available.

Reviewed by: marcel@


# 159627 14-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


# 159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


# 157680 12-Apr-2006 alc

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


# 157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


# 152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


# 152359 13-Nov-2005 alc

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


# 152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


# 152042 04-Nov-2005 alc

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


# 152002 03-Nov-2005 alc

Remove the remaining spl*() calls. Add some assertions. Eliminate some
excessive white space.


# 151543 21-Oct-2005 ade

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


# 149806 05-Sep-2005 marcel

o In pmap_remove_pte: always invalidate the page. Previously the page
was not invalidated if the PTE was not actually being removed. In
an UP kernel this didn't cause problems, because the new mapping
would preempt the old one. In an SMP kernel this could lead to the
use of stale translations when processes move between CPUs at the
"right" moment. This fixes the last of the obvious SMP problems
and it should be safe to enable SMP by default now.
o In pmap_remove_pte: minor code refactoring to avoid duplication.
o Test all PTE pointers against NULL. Don't use implicit boolean
tests.


# 149777 03-Sep-2005 marcel

o s/vhpt_size/pmap_vhpt_log2size/g
o s/vhpt_base/pmap_vhpt_base/g
o s/vhpt_bucket/pmap_vhpt_bucket/g
o Declare the above in <machine/pmap.h>
o Move the vm.stats.vhpt.* sysctls to machdep.vhpt.*
o Create a tunable machdep.vhpt.log2size, with corresponding sysctl.
The tunable allows the user to specify the VHPT size from the loader.
o Don't keep track of the number of PTEs in the VHPT. Calculate the
population when necessary by iterating the buckets and summing up
the length of the buckets.
o Don't perform the tpa instruction with a bucket lock held. The
instruction can (theoretically) fault and locking is not needed.


# 149770 03-Sep-2005 marcel

Fix collision chain termination checks. The result of IA64_PHYS_TO_RR7
is never 0, so one cannot test for a NULL pointer after a physical
address is translated into a virtual pointer with said macro. Instead,
keep the physical address around and test it against 0. Note that
this obviously implies that a PTE can never be allocated at physical
address 0. This isn't exactly guaranteed, but hasn't been a problem
so far. We test the physical address against 0 for as long as the ia64
port exists...


# 149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


# 149037 13-Aug-2005 marcel

o s/pmap_lpte_/pmap_/g
o Remove pmap_is_referenced(). It was already compiled-out.


# 148807 06-Aug-2005 marcel

Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.


# 147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


# 145173 16-Apr-2005 marcel

Add a kpte command to DDB. It dumps the PTE of a KVA. This helps
to analyze faults and TLB/VHPT inconsistencies.


# 139790 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


# 138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


# 138752 12-Dec-2004 marcel

Fix the last of the instability and the cause of the annoying
"vm_fault: fault on nofault entry, addr: %lx" panic. The problem was a
stale PTE in the TLB that marked the page as not present, even though
we had a good PTE in the VHPT. We typically don't yet insert PTEs in
the TLB. We do that lazily. The CPU will look for the PTE in the VHPT
when there's no PTE in the TLB. Unfortunately this doesn't handle the
case of the stale PTE in the TLB. The quick fix is to invalidate the
TLB (sloppily) when the VHPT doesn't contain a valid PTE. This is also
the only case that may cause a PTE in the TLB that marks a page as
non-present.


# 137978 21-Nov-2004 marcel

Remove struct ia64_itir and use a plain old uint64_t instead.


# 136070 02-Oct-2004 alc

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


# 136050 02-Oct-2004 alc

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


# 135590 22-Sep-2004 marcel

Redefine a PTE as a 64-bit integral type instead of a struct of
bit-fields. Unify the PTE defines accordingly and update all
uses.


# 135589 22-Sep-2004 marcel

s/u_int#_t/uint#_t/g


# 135443 18-Sep-2004 alc

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


# 134509 30-Aug-2004 alc

Remove unnecessary check for curthread == NULL.


# 134393 27-Aug-2004 alc

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


# 133405 09-Aug-2004 marcel

Better preserve the original protection for the mappings we maintain.
The hardware always gives read access for privilege level 0, which
means that we cannot use the hardware access rights and privilege
level in the PTE to test whether there's a change in protection. So,
we save the original vm_prot_t in the PTE as well.
Add pmap_pte_prot() to set the proper access rights and privilege
level on the PTE given a pmap and the requested protection.

The above allows us to compare the protection in pmap_extract_and_hold()
which was missing. While in pmap_extract_and_hold(), add pmap locking.

While here, clean up most (i.e. all but one) PTE macros we inherited
from alpha. They were either unused, used inconsistently, badly named
or simply weren't beneficial. We save the wired and managed state of
the PTE in distinct (bit) fields.

While in pte.h, s/u_int64_t/uint64_t/g

pmap locking obtained from: alc@
feedback & review by: alc@


# 132897 30-Jul-2004 alc

- Add pmap locking to ia64's pmap_enter() and pmap_enter_quick(). (This
brings ia64 to parity with alpha, amd64, and i386 in this area.)
- Prevent a race in pmap_find_pte(): If pmap_find_pte() sleeps in
uma_zalloc(), another thread could allocate a pte at the same address.
Instead, sleep at a higher level and retry the lookup before retrying
the allocation.

Reviewed and tested by: marcel@


# 132522 22-Jul-2004 alc

In pmap_mincore() create a private copy of the pte for use after the pmap
lock is released.


# 132487 21-Jul-2004 alc

Additional pmap locking

Tested by: marcel@


# 132378 19-Jul-2004 alc

Add partial pmap locking.

Tested by: marcel@


# 132235 16-Jul-2004 alc

Remove unused fields from the pmap.


# 132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


# 132170 15-Jul-2004 alc

A loop in pmap_remove() should use TAILQ_FOREACH_SAFE(), not
TAILQ_FOREACH(), because the loop deletes elements from the list.

Reviewed by: marcel@


# 132085 13-Jul-2004 alc

Simplify pmap_protect().


# 132082 13-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


# 131662 05-Jul-2004 alc

- Correct pmap_extract()'s return type. It should be vm_paddr_t, not
vm_offset_t.
- Convert pmap_extract() to the ANSI style of declaration.


# 130742 19-Jun-2004 alc

Remove dead code related to pv entry allocation.

Reviewed by: marcel@


# 130362 11-Jun-2004 alc

Neither pmap_enter() nor pmap_enter_quick() should create pv entries for
unmanaged pages.

Tested by: marcel@


# 130338 11-Jun-2004 alc

Reduce the number of preallocated pv entries and lpte entries in
pmap_init().

Tested by: marcel@


# 128105 11-Apr-2004 alc

Remove a comment that refers to avail_start and avail_end as these
variables no longer exist.


# 128097 10-Apr-2004 alc

- pmap_kenter_temporary() is unused by machine-independent code. Therefore,
move its declaration to the machine-dependent header file on those
machines that use it. In principle, only i386 should have it.
Alpha and AMD64 should use their direct virtual-to-physical mapping.
- Remove pmap_kenter_temporary() from ia64. It is unused. Approved
by: marcel@


# 127922 05-Apr-2004 alc

Remove avail_end. As of yesterday, it is unused.


# 127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


# 127869 04-Apr-2004 alc

Remove unused arguments from pmap_init().


# 126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


# 126716 07-Mar-2004 alc

Integrate the code from pmap_pinit2() into pmap_pinit(), leaving
pmap_pinit2() empty.

Approved by: marcel


# 120914 08-Oct-2003 marcel

Include <sys/smp.h> for the prototype of smp_rendezvous().


# 120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


# 120294 20-Sep-2003 marcel

Move uma_small_alloc() and uma_small_free() to uma_machdep.c. These
functions reference UMA internals from <vm/uma_int.h>, which makes
them highly unwanted in non-UMA specific files.

While here, prune the includes in pmap.c and use __FBSDID(). Move
the includes above the descriptive comment.

The copyright of uma_machdep.c is assigned to the project and can
be reassigned to the foundation if and when when such is preferrable.


# 119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


# 119906 09-Sep-2003 marcel

Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. The
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.

The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.


# 119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


# 119861 07-Sep-2003 alc

MFamd64/i386
Add necessary page locking to pmap_mincore().


# 118640 07-Aug-2003 marcel

MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().


# 118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


# 118237 30-Jul-2003 peter

Remove leftover relic of pmap_new_thread() etc.


# 118024 25-Jul-2003 alc

MFi386 revision 1.416
Add vm object locking to pmap_prefault().

Note: powerpc and sparc64 do not implement this function.


# 117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


# 117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


# 117022 29-Jun-2003 alc

- Remove the calls to pmap_install() from pmap_object_init_pt(); they are
redundant. Discussed with: marcel
- MFi386: Add vm object locking to pmap_object_init_pt().


# 116510 18-Jun-2003 alc

Fix a performance bug in all of the various implementations of
uma_small_alloc(): They always zeroed the page regardless of what the
caller requested.


# 116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


# 116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


# 115937 07-Jun-2003 marcel

pmap_find_vhpt() has been observed to return a NULL pointer when
the caller assumes this to not happen by means of performing an
indirection without checking the return value. Add KASSERTs to
force a kernel with INVARIANTS to panic. This is a short-term
measure. The pmap code is scheduled to be overhauled.


# 115339 26-May-2003 marcel

Revision 1.99 of this file changed the allocation request from
VM_ALLOC_INTERRUPT to VM_ALLOC_SYSTEM. There was no mention of
this in commit log as it was considered harmless. Guess what:
it does harm. WITNESS showed that we can not safely grab the
page queue lock in vm_page_alloc() in all cases as we may have
to sleep on it. Revert the request to VM_ALLOC_INTERRUPT to
circumvent this. We panic if vm_page_alloc returns 0. I'm not
entirely happy about this, but we have bigger fish to fry.

Approved by: re@ (blanket)


# 115176 20-May-2003 marcel

Prevent corruption of the VHPT collision chain by protecting it with
a mutex. The only volatile chain operations are insertion and deletion
but since updating an existing PTE also updates the VHPT entry itself,
and we have the VHPT mutex in both other cases, we also lock when we
update an existing PTE even though no chain operation is involved.
Note that we perform the insertion and deletion careful enough that
we don't need to lock traversals. If we need to lock traversals, we
also need to lock from the exception handler, which we can't without
creating a trapframe.

We're now able to withstand a -j8 buildworld. More work is needed to
withstand Murphy fields. In other words: we still have a bogon...

Approved by: re@ (blanket)


# 115152 19-May-2003 marcel

Turn pmap_install_pte() into a critical section. We better not get
interrupted while writing into the VHPT table. While here, make sure
memory accesses a properly ordered. Tag invalidation must happen
first so that the hardware VHPT walker will not be able to match
this entry while we're updating it and we have to make sure the new
new tag gets written only after the PTE is completely updated.

Approved by: re (blanket)


# 115149 19-May-2003 marcel

Unconditionally set pcb_current_pmap. WIP versions of the code
previously committed cleared pcb_current_pmap prior to changing
the region registers, but that was removed before committing.
Since we don't normally (at all?) pass a NULL pointer, the bug
was mostly harmless. Fix it while I'm here...

I'm here because we need to have data serialization after writing
to the region registers. Not doing so was likely the cause of the
hangs we were experiencing. General exceptions in cpu_switch may
also be caused by the lack of serialization.

Approved by: re (blanket)


# 115148 19-May-2003 marcel

pmap_install() needs to be atomic WRT to context switching. Protect
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h

Approved by: re (blanket)


# 115084 16-May-2003 marcel

Revamp of the syscall path, exception and context handling. The
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).

Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)


# 115063 16-May-2003 marcel

o In pmap_install, don't prevent switching the pmap if we're
switching to kernel_pmap. The pmap is not special enough.
o Clear the active bit on the pmap we're switching out.
o Fix some nearby style(9) bugs.

Approved by: re@


# 115059 16-May-2003 marcel

Indent a comment. This makes 1.100.

Still approved by: re@ (blanket)


# 115058 16-May-2003 marcel

Turn pmap_growkernel() into a critical section. While here, initialize
kernel_vm_end in pmap_bootstrap. Don't delay the initialization until
we need to grow the kernel VM space. This BTW happens twice before
we enter either single- or multi-user mode. Don't adjust kernel_vm_end
while growing based on whether the KPT contains a non-NULL entry. We
trust kernel_vm_end to be correct and we make sure it's still correct
after growing.
Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS
and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.


# 115057 16-May-2003 marcel

Revamp the RID allocation code:
o Limit the size of the region ID map to 64KB. This gives a bitmap
that is large enough to keep track of 2^19 numbers. The minimal map
size is 32KB. The reason we limit the map size is that processor
models may have implemented a 24-bit region ID, which would give
a 2MB bitmap while the maximum number of allocations is always
less than PID_MAX*5, which is less than 2^19.
o Allocate all region IDs up-front. The slight downside of reserving
more RIDs then a process needs (3 for ia64 native and 1 for ia32)
is preferable over the call to pmap_ensure_rid() where RIDs are
allocated on demand. On SMP systems this may lead to a race
condition.
o When allocating a region ID, don't use arc4random(). We're not
interested in randomness or uniform distribution across the
spectrum. We only need uniqueness. Random numbers may easily
collide when the number of allocated RIDs is high, creating a
possibly unbounded retry rate.


# 115056 16-May-2003 marcel

Move the conditional definition of KSTACK_MAX_PAGES up ahead where
it's more visible.

Approved by: re@ (blanket)


# 113831 21-Apr-2003 marcel

Don't use the tpa instruction to implement pmap_kextract. The tpa
instruction requires that a translation is present in the TC. This
may trigger a TLB miss and a subsequent call to vm_fault().
This implementation is deliberately non-inline for debugging and
profiling purposes. Partial or full inlining should eventually be
done.

Valuable insights by: jake


# 111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 110959 15-Feb-2003 marcel

Fix misuse of Maxmem in the calculation of the VHPT size. Maxmem
is already in pages, so we should not convert from bytes to pages.
The result of this bug was bad scaling of the VHPT relative to the
available memory.

Submitted by: Arun Sharma <arun@sharma-home.net>


# 110784 13-Feb-2003 alc

MFi386
Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


# 110211 01-Feb-2003 marcel

Remove special casing for running in the simulator from the kernel
and instead add platform, firmware and EFI stubs to the loader.
The net effect of this change is that besides a special console and
disk driver, the kernel has no knowledge of the simulator. This has
the following advantages:
o Simulator support is much harder to break,
o It's easier to make use of more feature complete simulators.
This would only need a change in the simulator specific loader,
o Running SMP kernels within the simulator. Note that ski at this
time does not simulate IPIs, so there's no way to start APs.

The platform, firmware and EFI stubs describe the following hardware:
o 4 CPU Itanium,
o 128 MB RAM within the 4GB address space,
o 64 MB RAM above the 4GB address space.

NOTE: The stubs in the skiloader describe a machine that should in
parts be defined by the simulator. Things like processor interrupt
block and AP wakeup vector cannot be choosen at random because they
require interpretation by the simulator. Currently the simulator is
ignorant of this.

This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS
which is ignored by the simulator.

Tested with: ski (version 0.943 for linux)


# 109799 24-Jan-2003 dfr

Fix pmap_extract so that it doesn't panic if the user types
'cat /proc/pid/map'

Submitted by: Arun Sharma <arun.sharma@intel.com>


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 109615 21-Jan-2003 jeff

- Add a VM_WAIT in the appropriate cases where vm_page_alloc() fails and flags
indicate that uma_small_alloc should not. This code should be refactored so
that there is not so much cross arch duplication.

Reviewed by: jake
Spotted by: tmm
Tested on: alpha, sparc64
Pointy hat to: jeff and everyone who cut and pasted the bad code. :-)


# 108643 04-Jan-2003 alc

Hold the page queues lock around pmap_remove_pte() in pmap_enter().

Submitted by: Arun Sharma <adsharma@unix-os.sc.intel.com>


# 107394 29-Nov-2002 marcel

Better handle sparse physical memory: Don't use the address range
as a measure for available memory to scale the VHPT. Instead, use
the previously determined Maxmem.

Approved by: re (carte blanc)


# 107028 17-Nov-2002 alc

MFi386 r1.369
- Clear the PG_WRITEABLE flag in pmap_page_protect() if write access is
being removed. Return immediately if write access is being removed and
PG_WRITEABLE is already clear.


# 106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


# 106753 11-Nov-2002 alc

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


# 106486 06-Nov-2002 marcel

Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region
7 addresses for use by page tables and kernel stacks.

Obtained from: peter


# 105001 12-Oct-2002 marcel

Polish previous commit:
o Replace KSTACK_PAGES with pages on panic() in pmap_new_thread(),
o Fix style bugs in adjacent code,
o Use NULL instead of 0 for pointers,
o Save the virtual kstack address if we create an alternate
kstack because 1) we can derive the physical (RR7) address
from it and 2) we need the virtual address for contigfree()
in pmap_dispose_thread(). Thus td_altkstack saves
td_md.md_kstackvirt.


# 104939 11-Oct-2002 peter

cut/paste the pmap_new_altkstack stuff from the other platforms.
It's no different here. Update the rest of the kstack API's for scottl's
changes.


# 104938 11-Oct-2002 peter

Call uma_zalloc on pvzone with M_NOWAIT, just like i386 and alpha.
Otherwise we get hundreds of 'could sleep' during boot.


# 104320 01-Oct-2002 phk

Fix the same misinitialization of pmap_prefault_pageorder as on i386.

Suggeste by: jake


# 102837 02-Sep-2002 alc

o Remove an initialized but unused variable from pmap_remove_all().


# 102663 31-Aug-2002 peter

Do not use an object for the pte and pv zones on ia64 because it overrides
the pmap_allocf() function that we provide above. We still use the limits
via other means.

Submitted by: jeff


# 102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


# 101645 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag from the alpha and
ia64 pmap.
o Remove the PG_MAPPED flag's declaration.


# 101199 02-Aug-2002 alc

o Lock page queue accesses by vm_page_deactivate().


# 100543 23-Jul-2002 arr

- Pass the VM_ALLOC_WIRED flag to vm_page_alloc() in pmap_growkernel() so
that we can avoid a call to vm_page_lock_queues().

Approved by: peter


# 100269 17-Jul-2002 peter

Fix some typos in 1.68 from over a week ago.


# 100268 17-Jul-2002 peter

Cap the initial PV and PTE table preallocations. Otherwise we explode
on the Itanium2 system I have when we use up *all* of the initial 256MB
direct mapped region before we are ready to dynamically expand it.

The machine that I have has 4 cpus and a very big hole in the middle.
This makes the bogus '(last_address - first_address) / PAGE_SIZE'
calculations especially dangerous and caused many millions of initial
PV/PTE's to be preallocated.


# 100000 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


# 99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


# 99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


# 99422 04-Jul-2002 peter

Back out proc part of last commit. UMA manages the thread cache only, and
we just have to deal with the kstack when told to. We do not have a
UMA-managed cache for the proc struct and its associated upage yet. So,
go back to the old lazy mechanism. Note that if UMA destroys pages that
used to contain proc structures, we'll lose the corresponding upage
forever. (zones never did this - once a page was allocated, it stayed
attached to the proc zone forever)


# 99421 04-Jul-2002 peter

Copy from sparc64/pmap.c rev 1.64 (Retrofit changes from i386/pmap.c
rev 1.328-1.331.) but for uarea only. We still have our own broken
kstack code here.


# 98773 24-Jun-2002 dfr

Add UMA_ZONE_VM flag to the zones which are used for pmap_enter().


# 98471 20-Jun-2002 peter

panic rather than fault and explode if we fail to contigmalloc a kernel
stack. This is still bad(TM), but at least we have a clue when we get
hit when contigmalloc fails.


# 98470 20-Jun-2002 peter

Use the canonical pmap_{new,dispose,swapin,swapout}_proc() functions,
in this case cut/pasted from sparc64 instead of messing with
contigmalloc where it is not needed.


# 96029 04-May-2002 dfr

Use region 7 addresses for the slabs in the PV and PT zones so that we
don't confuse the zone allocater by translating region 5 addresses to
region 7 addresses (which is unavoidable for PTEs).


# 96019 04-May-2002 marcel

Make sure we don't index the pm_rid array out of bounds in
pmap_ensure_rid(). This can happen because the function is
called for both user and kernel addresses, while the rid array
only has room for user addresses. This bug got exposed by rev
1.58 of ia64/ia64/pmap.c and rev 1.8 of ia64/include/pmap.h.


# 95920 02-May-2002 marcel

In pmap_pinit0, remove duplicate initialization.


# 95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


# 94779 15-Apr-2002 peter

Fix an "oops!" that turned out to be mostly harmless (but gave a warning).
I did this right on the sparc64. Store the direct mapped addresses in
the correct variables.

Submitted by: jake


# 94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


# 94364 10-Apr-2002 dfr

Initialise PCPU_GET(current_pmap) in pmap_bootstrap - cpu_switch needs
to be sure that it is always correct and this was not true for the first
call to cpu_switch. When thread0 resumed later, it ended up calling
pmap_install with a null pmap, which is bad.


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 92870 21-Mar-2002 dfr

Change critical_t to register_t for intr_disable/restore.


# 92851 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.

Approved by: peter


# 92843 20-Mar-2002 alfred

Remove __P.

Reviewd by: peter


# 92805 20-Mar-2002 dfr

Change intr_enable to intr_restore for consistency with sparc64.


# 92782 20-Mar-2002 dfr

Replace calls to cpu_critical_enter/exit with appropriate calls to
either explicitly disable interrupts or use a real critical section,
as appropriate.


# 92696 19-Mar-2002 peter

#if 0 some unused variables (only in #if 0 code)


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 92552 18-Mar-2002 dfr

Fix spelling.


# 92263 14-Mar-2002 dfr

* Use a mutex to protect the RID allocator.
* Use ptc.g instead of ptc.l so that TLB shootdowns are broadcast to the
coherence domain.
* Use smp_rendezvous for pmap_invalidate_all to ensure it happens on all
cpus.
* Dike out a DIAGNOSTIC printf which didn't compile.
* Protect the internals of pmap_install with cpu_critical_enter/exit.


# 91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


# 90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 88692 30-Dec-2001 marcel

Make vhpt_base and vhpt_size globals so that they can be used by
the AP startup code.


# 88245 20-Dec-2001 peter

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


# 88088 17-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


# 87119 30-Nov-2001 dfr

* Don't use critical_enter/critical_exit when accessing the VHPT - its
pointless and would be inadequate for SMP systems. We will rely on the
VM system's locks to serialise this for now.
* Change pmap_remove() so that if the range being removed is larger than
the number of pages mapped by the pmap, we iterate over the currently
mapped pages instead of over the virtual address range. This should
make a difference when removing large virtual address ranges from an
address space.


# 86443 16-Nov-2001 peter

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


# 86442 16-Nov-2001 peter

Merge rev 1.264 from i386/pmap.c (tegge via alfred):
Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.


# 86441 16-Nov-2001 peter

Merge rev 1.202 from i386/pmap.c (back in 1998 by John Dyson):
Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


# 86440 16-Nov-2001 peter

Merge rev 1.293 of i386/pmap.c - skip PG_UNMANAGED in pmap_collect()


# 86438 16-Nov-2001 peter

Converge with i386/pmap.c - dont refer to curproc, use curthread.


# 86435 15-Nov-2001 peter

As part of a general cleanup and reconvergence of related pmap code,
start tidying up some loose ends. The DEBUG_VA stuff has long since
passed its use-by date. It wasn't used on ia64 but got cut/pasted there.


# 86213 09-Nov-2001 dfr

* Make sure we increment pm_stats.resident_count in pmap_enter_quick
* Re-organise RID allocation so that we don't accidentally give a RID
to two different processes. Also randomise the order to try to reduce
collisions in VHPT and TLB. Don't allocate RIDs for regions which are
unused.
* Allocate space for VHPT based on the size of physical memory. More
tuning is needed here.
* Add sysctl instrumentation for VHPT - see sysctl vm.stats.vhpt
* Fix a bug in pmap_prefault() which prevented it from actually adding
pages to the pmap.
* Remove ancient dead debugging code.
* Add DDB commands for examining translation registers and region
registers.

The first change fixes the 'free/cache page %p was dirty' panic which I
have been seeing when the system is put under moderate load. It also
fixes the negative RSS values in ps which have been confusing me for a
while.

With this set of changes the ia64 port is reliable enough to build its
own kernels, even with a 20-way parallel build. Next stop buildworld.


# 85930 02-Nov-2001 dillon

Implement i386/i386/pmap.c 1.292 for alpha, ia64 (avoid free
page exhaustion / kernel panic for certain madvise() scenarios)


# 85856 02-Nov-2001 dfr

Remember to actually free the pv_entry in pmap_remove_entry().


# 85439 24-Oct-2001 dfr

* Clear the TLB on boot.
* If a pte for a location given to pmap_enter_quick is valid, just give
up - don't panic, even if the mapping is different.


# 85142 19-Oct-2001 dfr

Rework pmap so that it separates the PTE structure from the pv_entry
structure. This makes it possible to pre-allocate PTEs for the kernel,
which is necessary for a reliable implementation of pmap_kenter(). This
also avoids wasting space (about 48 bytes per page) for kernel mappings
and user mappings of memory-mapped devices.

This also fixes a bug with the previous version where the implementation
required the pv_entry structure to be physically contiguous but did not
enforce this (the structure size was not a power of two). This meant
that the pv_entry free list was quickly corrupted as soon as the system
was even mildly loaded.


# 85024 16-Oct-2001 dfr

Size the number of pv_entries we use to bootstrap the pv_entry allocator
based on the size of physical memory. This should eliminate the tweaking
needed for larger memory configurations.


# 84555 05-Oct-2001 dfr

Use physical addresses, not virtual addresses when calling PHYS_TO_VM_PAGE.


# 84127 29-Sep-2001 dfr

* Read parameters for ptc.e instruction from PAL Code.
* Add pmap_unmapdev().


# 83907 24-Sep-2001 dfr

Increase the number of bootstrap PVs.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83197 07-Sep-2001 dfr

Typo in comment.


# 83196 07-Sep-2001 dfr

* Track ref/mod information properly when a mapping changes.
* Fix a panic in pmap_remove() for a non-current pmap.


# 82632 31-Aug-2001 peter

Same as i386/i386/pmap.c: clean up some style. This is irrelevant since
it is inside #if 0'ed code, but it would be a shame if this stuff got
cut/pasted elsewhere.


# 80431 26-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


# 77448 29-May-2001 jhb

- Catch up to the VM mutex changes.
- Sort includes in a few places.


# 75668 18-Apr-2001 dfr

Don't panic when we try to modify the kernel pmap.


# 74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 74903 28-Mar-2001 jhb

Switch from save/disable/restore_intr() to critical_enter/exit().


# 73936 07-Mar-2001 jhb

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


# 73903 06-Mar-2001 jhb

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


# 73862 06-Mar-2001 jhb

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


# 71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


# 69947 12-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


# 67213 16-Oct-2000 dfr

In pmap_remove_pv(), only manipulate the page's list if the pv is
managed.


# 67017 12-Oct-2000 dfr

* Allocate kernel stacks with contigmalloc() to make exception handling
safe - we can't afford to take a TLB trap when we are writing a
trapframe. Possibly revisit this later.
* Various fixes to pmap_enter() so that it actually works properly.


# 66937 10-Oct-2000 dfr

* Add rudimentary DDB support (no kgdb, no backtrace, no single step).
* Track recent changes to SWI code.
* Allocate RIDs for pmaps (untested).
* Implement assembler version of cpu_switch - its cleaner that way.


# 66633 04-Oct-2000 dfr

Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).


# 66486 30-Sep-2000 dfr

Next round of ia64 work, including fixes to context switching,
implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these
changes, my test kernel reaches the mountroot prompt.


# 66463 29-Sep-2000 dfr

Implement dirty and access bit exceptions.


# 66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.