Cross Reference: /freebsd-9.3-release/sys/ia64/ia64/pmap.c

History log of /freebsd-9.3-release/sys/ia64/ia64/pmap.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 267654	19-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-9.3-release
# 251963	18-Jun-2013	marius	MFC: r238190 Implement ia64_physmem_alloc() and use it consistently to get memory before VM has been initialized.
# 251897	18-Jun-2013	scottl	Merge the second part of the unmapped I/O changes. This enables the infrastructure in the block layer and UFS filesystem as well as a few drivers. The list of MFC revisions is long, so I won't quote changelogs. r248508,248510,248511,248512,248514,248515,248516,248517,248518, 248519,248520,248521,248550,248568,248789,248790,249032,250936 Submitted by: kib Approved by: kib Obtained from: Netflix
# 248814	28-Mar-2013	kib	MFC r248280: Add pmap function pmap_copy_pages().
# 248085	09-Mar-2013	marius	MFC: r227309 (partial) Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 240238	08-Sep-2012	kib	MFC r239065: Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it.
# 226319	12-Oct-2011	kib	Handle page dirty mask with atomics. MFC r225838: Use the explicitly-sized types for the dirty and valid masks. MFC r225840: Use the trick of performing the atomic operation on the contained aligned word to handle the dirty mask updates in vm_page_clear_dirty_mask(). MFC 225841 Remove locking of the vm page queues from several pmaps. MFC r225843: Fix grammar. MFC r225856: Style nit. Approved by: re (bz)
# 225736	22-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
# 225418	06-Sep-2011	kib	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
# 224746	09-Aug-2011	kib	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
# 224663	05-Aug-2011	marcel	Follow-up commit: refactor pmap_kextract() to make it easier to catch and debug issues like the one fixed in the previous commit: Replace all return statements with goto statements so that we end up at a single place with a value for the physical address. Print a message for all unknown KVA addresses. Approved by: re (blanket)
# 224662	05-Aug-2011	marcel	Remove stray semicolon in pmap_kextract() that turned the conditional "return (0)" into an unconditional one and as such broke PBVM address queries -- such as during kernel core dumps. Approved by: re (blanket)
# 224116	16-Jul-2011	marcel	Don't assume pmap_mapdev() gets called only for memory mapped I/O addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev() rather excessively (number of calls) just to get a valid KVA. To that end, have pmap_mapdev(): 1. cache the last result so that we don't waste time for multiple consecutive invocations with the same PA/SZ. 2. find the memory descriptor that covers the PA and return NULL if none was found or when the PA is for a common DRAM address. 3. Use either a region 6 or region 7 KVA, in accordance with the memory attribute.
# 223873	08-Jul-2011	marcel	Implement basic support for memory attributes. At this time we only distinguish between UC and WB memory so that we can map the page to either a region 6 address (for UC) or a region 7 address (for WB). This change is only now possible, because previously we would map regions 6 and 7 with 256MB translations and on top of that had the kernel mapped in region 7 using a wired translation. The introduction of the PBVM moved the kernel into its own region and freed up region 7 and allowed us to revert to standard page-sized translations. This commit inroduces pmap_page_to_va() that respects the attribute.
# 223732	02-Jul-2011	alc	When iterating over a paging queue, explicitly check for PG_MARKER, instead of relying on zeroed memory being interpreted as an empty PV list. Reviewed by: kib
# 223700	30-Jun-2011	marcel	Change the management of nested faults by switching to physical addressing while reading or writing the trap frame. It's not possible to guarantee that the one translation cache entry that we depend on is not going to get purged by the CPU. We already know that global shootdowns (ptc.g and/or ptc.ga) can (and will) cause multiple TC entries to get purged and we initialize tried to handle that by serializing kernel entry with these operations. However, we need to serialize kernel exit as well. But even if we can serialize, it appears that CPU threads within a core can affect each other's TC entries beyond the global shootdown. This would mean serializing any and all translatation cache updates with the threads in a core with the kernel entry and exit of any thread in that core. This is just too painful and complicated. Since we already properly coded for the 2 nested faults that we can get, all we need to do is use those to obtain the physical address of the trap frame, switch to physical mode and in that way eliminate any further faults. The trap frame is already aligned to 1KB boundaries to make sure we don't cross the page boundary, this is safe to do. We still need to serialize ptc.g or ptc.ga across CPUs because the platform can only have 1 such operation outstanding at the same time. We can now use a regular (spin) lock for this. Also, it has been observed that we can get a nested TLB faults for region 7 virtual addresses. This was unexpected. For now, we enhance the nested TLB fault handler to deal with those as well, but it needs to be understood.
# 223677	29-Jun-2011	alc	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
# 223170	17-Jun-2011	marcel	Properly serialize the global shootdown with the instruction stream of the local processor. Also explicitly invalidate the ALAT. This is done on the other CPUs in the coherence domain by virtue of the ptc.ga instruction, but does not apply to the local CPU.
# 222531	31-May-2011	nwhitehorn	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
# 221606	07-May-2011	marcel	In pmap_kextract(), return the physical address for PBVM virtual addresses as well (incl. the PBVM page table).
# 221271	30-Apr-2011	marcel	Stop linking against a direct-mapped virtual address and instead use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines. While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.
# 219841	21-Mar-2011	marcel	Fix switching to physical mode as part of calling into EFI runtime services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM. When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time. While here, change bootinfo to a pointer type in anticipation of future development.
# 219808	20-Mar-2011	marcel	Change region 4 to be part of the kernel. This serves 2 purposes: 1. The PBVM is in region 4, so if we want to make use of it, we need region 4 freed up. 2. Region 4 and above cannot be represented by an off_t by virtue of that type being signed. This is problematic for truss(1), ktrace(1) and other such programs.
# 209085	11-Jun-2010	marcel	The ptc.g operation for the Mckinley and Madison processors has the side-effect of purging more than the requested translation. While this is not a problem in general, it invalidates the assumption made during constructing the trapframe on entry into the kernel in SMP configurations. The assumption is that only the first store to the stack will possibly cause a TLB miss. Since the ptc.g purges the translation caches of all CPUs in the coherency domain, a ptc.g executed on one CPU can cause a purge on another CPU that is currently running the critical code that saves the state to the trapframe. This can cause an unexpected TLB miss and with interrupt collection disabled this means an unexpected data nested TLB fault. A data nested TLB fault will not save any context, nor provide a way for software to determine what caused the TLB miss nor where it occured. Careful construction of the kernel entry and exit code allows us to handle a TLB miss in precisely orchastrated points and thereby avoiding the need to wire the kernel stack, but the unexpected TLB miss caused by the ptc.g instructution resulted in an unrecoverable condition and resulting in machine checks. The solution to this problem is to synchronize the kernel entry on all CPUs with the use of the ptc.g instruction on a single CPU by implementing a bare-bones readers-writer lock that allows N readers (= N CPUs entering the kernel) and 1 writer (= execution of the ptc.g instruction on some CPU). This solution wins over a rendez-vous approach by not interrupting CPUs with an IPI. This problem has not been observed on the Montecito. PR: ia64/147772 MFC after: 6 days
# 209048	11-Jun-2010	alc	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)
# 208990	10-Jun-2010	alc	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@
# 208659	30-May-2010	alc	Simplify the inner loop of get_pv_entry(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty, wait instead until the loop completes.
# 208646	29-May-2010	alc	Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.
# 208574	26-May-2010	alc	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
# 208504	24-May-2010	alc	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
# 208175	16-May-2010	alc	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
# 207796	08-May-2010	alc	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
# 207410	29-Apr-2010	kmacy	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
# 207373	29-Apr-2010	alc	MFamd64/i386 r207205 Clearing a page table entry's accessed bit and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it. Moreover, on ia64, don't set the page's dirty field unless pmap_protect() is removing write access.
# 207155	24-Apr-2010	alc	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
# 205454	22-Mar-2010	marcel	o Remove the pmap argument to pmap_invalidate_all() as it's not used other than in a potentially dangerous KASSERT. o Hand-inline pmap_remove_page() as it's only called from 1 place and the abstraction that pmap_remove_page() provides is not enough to warrant the obfuscation. Eliminate the dangerous KASSERT in the process. o In pmap_remove_pte(), remove the KASSERT for pmap being the current one as it's not safe in the face of CPU migration.
# 205435	22-Mar-2010	marcel	Drop the pmap argument to pmap_invalidate_page(). It's not used other than in a KASSERT. The KASSERT is broken in that it's done outside the critical section and as such isn't protected against CPU migration. Improve pmap_invalidate_page() as follows: o calculate vhpt_ofs inside the critical region for exactly the same reason. o calculate the tag outside the FOREACH loop, as it's loop-invariant. This is more efficient. o Replace the test and set with an atomic cmpset operation because we are changing other CPU's VHPT tables and this avoids invalidating after the entry got modified. Not necessarily a problem, but better safe than sorry.
# 204182	21-Feb-2010	marcel	Remove pm_active from struct pmap as it serves no purpose. MFC after: 1 week
# 203883	14-Feb-2010	marcel	Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.
# 200207	07-Dec-2009	marcel	Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly. While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.
# 200200	06-Dec-2009	marcel	Allocate the VHPT for each CPU in cpu_mp_start(), rather than allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU without memory waste -- MAXCPU is now 32 for SMP kernels. This change also eliminates the VHPT scaling based in the total memory in the system. It's the workload that determines the best size of the VHPT. The workload can be affected by the amount of memory, but not necessarily. For example, there's no performance difference between VHPT sizes of 256KB, 512KB and 1MB when building the LINT kernel. This was observed with a system that has 8GB of memory. By default the kernel will allocate a 1MB VHPT. The user can tune the system with the "machdep.vhpt.log2size" tunable.
# 198341	21-Oct-2009	marcel	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.
# 195840	24-Jul-2009	jhb	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
# 195625	11-Jul-2009	marcel	On exec(2), when loading the ELF image, pmap_enter_object() is called to prefault pages. This is an obvious place for making sure the I-cache is coherent. It was missing though. As such, execution over NFS and ZFS file systems was failing. NFS was fixed the wrong way (by flushing the D-cache as part of the NFS code) in a previous commit. ZFS problems were encountered after that and indicated that something else was wrong... Approved by: re (kib)
# 192324	18-May-2009	marcel	Rename ia64_invalidate_icache() to ia64_sync_icache(). We're not invalidating anything.
# 187381	18-Jan-2009	alc	Correct an error in revision 1.170 of this file. When get_pv_entry() is forced to reclaim pv entries, the one pv entry that it returns should not be freed.
# 179190	22-May-2008	marcel	Create the bucket mutexes with MTX_NOWITNESS. There's now a hard limit of 512 pending mutexes in the witness code and we can easily have 1 million bucket mutexes initialized before witness is up and running. Bumping the limit from 512 to 1M is not really an option here...
# 179081	18-May-2008	alc	Retire pmap_addr_hint(). It is no longer used.
# 178893	09-May-2008	alc	Add a stub for pmap_align_superpage() on machines that don't (yet) implement pmap-level support for superpages.
# 178309	19-Apr-2008	marcel	Sanitize the malloc types: M_PMAP is not used in pmap.c, so don't define it there. Don't use M_PMAP in mp_machdep.c; define M_SMP instead.
# 177769	30-Mar-2008	marcel	Better implement I-cache invalidation. The previous implementation was a kluge. This implementation matches the behaviour on powerpc and sparc64. While on the subject, make sure to invalidate the I-cache after loading a kernel module. MFC after: 2 weeks
# 176286	14-Feb-2008	marcel	On Montecito processors, the instruction cache is in fact not coherent with the data caches. Implement a quick fix to allow us to boot on Montecito, while I'm working on a better fix in the mean time. Commit made on Montecito-based Itanium...
# 175067	03-Jan-2008	alc	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
# 175066	03-Jan-2008	imp	Use correct function name in panic message
# 175065	03-Jan-2008	imp	Fix obsolete comment. pmap_remove_all is the function we're in.
# 173708	17-Nov-2007	alc	Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks
# 173361	05-Nov-2007	kib	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 171721	04-Aug-2007	marcel	Replace "__asm __volatile()" by equivalent support functions from ia64_cpu.h. This improves readability and consistency and aids in auditing the code. Add data-serialization after writing to the region registers and add instruction-serialization after writing to cr.pta. Approved by: re (blanket)
# 171663	30-Jul-2007	marcel	Explicitly map the VHPT on all processors. Previously we were merely lucky that the VHPT was mapped as a side-effect of mapping the kernel, but when there's enough physical memory, this may not at all be the case. Approved by: re (blanket)
# 170402	07-Jun-2007	marcel	Eliminate pmap_install(), which was used to wrap pmap_switch() and grab sched_lock. This would serialize calls to pmap_switch from cpu_switch(). With the introduction of thread_lock, this is not possible anymore, because thread_lock is not a single lock. It varies. Secondly and most importantly, it's not needed at all. The only requirement for pmap_switch() is that it's not preempted while in the middle of updating the CPU and PCPU. In other words, it's a critical region. No locking required.
# 170307	04-Jun-2007	jeff	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# 170026	27-May-2007	marcel	Have the processor defer all faults and exceptions for control speculative loads. This at least makes control speculative loads work. In the future we should analyze which faults/exceptions we want to handle rather than defer to avoid having to call the recovery code when it's not strictly necessary.
# 169773	19-May-2007	marcel	Fix GCC warning: va = va += PAGE_SIZE contains pointless operation va = va. Fix white space in nearby lines.
# 169760	19-May-2007	marcel	Add a level of indirection to the kernel PTE table. The old scheme allowed for 1024 PTE pages, each containing 256 PTEs. This yielded 2GB of KVA. This is not enough to boot a kernel on a 16GB box and in general too low for a 64-bit machine. By adding a level of indirection we now have 1024 2nd-level directory pages, each capable of supporting 2GB of KVA. This brings the grand total to 2TB of KVA.
# 169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 166860	21-Feb-2007	alc	Change pmap_protect() so that execute access can be removed without simultaneously removing write access.
# 166631	11-Feb-2007	marcel	Now that the free page queue mutex is a sleep mutex, we cannot call vm_page_alloc() from within a critical section in pmap_growkernel(). Since the need for a critical section may never have existed in the first place, simply get rid of it. Discussed with: alc@
# 164229	12-Nov-2006	alc	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)
# 163603	22-Oct-2006	alc	Eliminate unnecessary PG_BUSY tests.
# 160889	01-Aug-2006	alc	Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change. The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access. Reviewed by: marcel@ (ia64)
# 159971	27-Jun-2006	alc	Make several changes to pmap_enter_quick_locked(): 1. Make the caller responsible for performing pmap_install(). This reduces the number of times that pmap_install() is performed by pmap_enter_object() from twice per page to twice overall. 2. Don't block if pmap_find_pte() is unable to allocate a PTE. If it did block, then it might wind up mapping a cache page. Specifically, if pmap_enter_quick_locked() slept when called from pmap_enter_object(), the page daemon could change an active or inactive page into a cache page just before it was to be mapped. 3. Bail out of pmap_enter_quick_locked() if pv entries aren't plentiful. In other words, don't force the allocation of a pv entry if they aren't readily available. Reviewed by: marcel@
# 159627	14-Jun-2006	ups	Remove mpte optimization from pmap_enter_quick(). There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386. Reviewed by: alc,jhb,peter,ps
# 159303	05-Jun-2006	alc	Introduce the function pmap_enter_object(). It maps a sequence of resident pages from the same object. Use it in vm_map_pmap_enter() to reduce the locking overhead of premapping objects. Reviewed by: tegge@
# 157680	12-Apr-2006	alc	Retire pmap_track_modified(). We no longer need it because we do not create managed mappings within the clean submap. To prevent regressions, add assertions blocking the creation of managed mappings within the clean submap. Reviewed by: tegge
# 157443	03-Apr-2006	peter	Remove the unused sva and eva arguments from pmap_remove_pages().
# 152630	20-Nov-2005	alc	Eliminate pmap_init2(). It's no longer used.
# 152359	13-Nov-2005	alc	In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock cannot possibly occur.
# 152224	09-Nov-2005	alc	Reimplement the reclamation of PV entries. Specifically, perform reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations. Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away. Reviewed by: tegge
# 152042	04-Nov-2005	alc	Begin and end the initialization of pvzone in pmap_init(). Previously, pvzone's initialization was split between pmap_init() and pmap_init2(). This split initialization was the underlying cause of some UMA panics during initialization. Specifically, if the UMA boot pages was exhausted before the pvzone was fully initialized, then UMA, through no fault of its own, would use an inappropriate back-end allocator leading to a panic. (Previously, as a workaround, we have increased the UMA boot pages.) Fortunately, there is no longer any reason that pvzone's initialization cannot be completed in pmap_init(). Eliminate a check for whether pv_entry_high_water has been initialized or not from get_pv_entry(). Since pvzone's initialization is completed in pmap_init(), this check is no longer needed. Use cnt.v_page_count, the actual count of available physical pages, instead of vm_page_array_size to compute the maximum number of pv entries. Introduce the vm.pmap.pv_entries tunable on alpha and ia64. Eliminate some unnecessary white space. Discussed with: tegge (item #1) Tested by: marcel (ia64)
# 152002	03-Nov-2005	alc	Remove the remaining spl*() calls. Add some assertions. Eliminate some excessive white space.
# 151543	21-Oct-2005	ade	Specifically panic() in the case where pmap_insert_entry() fails to get a new pv under high system load where the available pv entries have been exhausted before the pagedaemon has a chance to wake up to reclaim some. Prior to this, the NULL pointer dereference ended up causing secondary panics with rather less than useful resulting tracebacks. Reviewed by: alc, jhb MFC after: 1 week
# 149806	05-Sep-2005	marcel	o In pmap_remove_pte: always invalidate the page. Previously the page was not invalidated if the PTE was not actually being removed. In an UP kernel this didn't cause problems, because the new mapping would preempt the old one. In an SMP kernel this could lead to the use of stale translations when processes move between CPUs at the "right" moment. This fixes the last of the obvious SMP problems and it should be safe to enable SMP by default now. o In pmap_remove_pte: minor code refactoring to avoid duplication. o Test all PTE pointers against NULL. Don't use implicit boolean tests.
# 149777	03-Sep-2005	marcel	o s/vhpt_size/pmap_vhpt_log2size/g o s/vhpt_base/pmap_vhpt_base/g o s/vhpt_bucket/pmap_vhpt_bucket/g o Declare the above in <machine/pmap.h> o Move the vm.stats.vhpt.* sysctls to machdep.vhpt.* o Create a tunable machdep.vhpt.log2size, with corresponding sysctl. The tunable allows the user to specify the VHPT size from the loader. o Don't keep track of the number of PTEs in the VHPT. Calculate the population when necessary by iterating the buckets and summing up the length of the buckets. o Don't perform the tpa instruction with a bucket lock held. The instruction can (theoretically) fault and locking is not needed.
# 149770	03-Sep-2005	marcel	Fix collision chain termination checks. The result of IA64_PHYS_TO_RR7 is never 0, so one cannot test for a NULL pointer after a physical address is translated into a virtual pointer with said macro. Instead, keep the physical address around and test it against 0. Note that this obviously implies that a PTE can never be allocated at physical address 0. This isn't exactly guaranteed, but hasn't been a problem so far. We test the physical address against 0 for as long as the ia64 port exists...
# 149768	03-Sep-2005	alc	Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine whether the mapping should permit execute access.
# 149037	13-Aug-2005	marcel	o s/pmap_lpte_/pmap_/g o Remove pmap_is_referenced(). It was already compiled-out.
# 148807	06-Aug-2005	marcel	Improve SMP support: o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU uses to look up translations it can't find in the TLB. As such, the VHPT serves as a level 1 cache (the TLB being a level 0 cache) and best results are obtained when it's not shared between CPUs. The collision chain (i.e. the hash bucket) is shared between CPUs, as all buckets together constitute our collection of PTEs. To achieve this, the collision chain does not point to the first PTE in the list anymore, but to a hash bucket head structure. The head structure contains the pointer to the first PTE in the list, as well as a mutex to lock the bucket. Thus, each bucket is locked independently of each other. With at least 1024 buckets in the VHPT, this provides for sufficiently finei-grained locking to make the ssolution scalable to large SMP machines. o Add synchronisation to the lazy FP context switching. We do this with a seperate per-thread lock. On SMP machines the lazy high FP context switching without synchronisation caused inconsistent state, which resulted in a panic. Since the use of the high FP registers is not common, it's possible that races exist. The ia64 package build has proven to be a good stress test, so this will get plenty of exercise in the near future. o Don't use the local ID of the processor we want to send the IPI to as the argument to ipi_send(). use the struct pcpu pointer instead. The reason for this is that IPI delivery is unreliable. It has been observed that sending an IPI to a CPU causes it to receive a stray external interrupt. As such, we need a way to make the delivery reliable. The intended solution is to queue requests in the target CPU's per-CPU structure and use a single IPI to inform the CPU that there's a new entry in the queue. If that IPI gets lost, the CPU can check it's queue at any convenient time (such as for each clock interrupt). This also allows us to send requests to a CPU without interrupting it, if such would be beneficial. With these changes SMP is almost working. There are still some random process crashes and the machine can hang due to having the IPI lost that deals with the high FP context switch. The overhead of introducing the hash bucket head structure results in a performance degradation of about 1% for UP (extra pointer indirection). This is surprisingly small and is offset by gaining reasonably/good scalable SMP support.
# 147217	10-Jun-2005	alc	Introduce a procedure, pmap_page_init(), that initializes the vm_page's machine-dependent fields. Use this function in vm_pageq_add_new_page() so that the vm_page's machine-dependent and machine-independent fields are initialized at the same time. Remove code from pmap_init() for initializing the vm_page's machine-dependent fields. Remove stale comments from pmap_init(). Eliminate the Boolean variable pmap_initialized from the alpha, amd64, i386, and ia64 pmap implementations. Its use is no longer required because of the above changes and earlier changes that result in physical memory that is being mapped at initialization time being mapped without pv entries. Tested by: cognet, kensmith, marcel
# 145173	16-Apr-2005	marcel	Add a kpte command to DDB. It dumps the PTE of a KVA. This helps to analyze faults and TLB/VHPT inconsistencies.
# 139790	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 139241	23-Dec-2004	alc	Modify pmap_enter_quick() so that it expects the page queues to be locked on entry and it assumes the responsibility for releasing the page queues lock if it must sleep. Remove a bogus comment from pmap_enter_quick(). Using the first change, modify vm_map_pmap_enter() so that the page queues lock is acquired and released once, rather than each time that a page is mapped.
# 138897	15-Dec-2004	alc	In the common case, pmap_enter_quick() completes without sleeping. In such cases, the busying of the page and the unlocking of the containing object by vm_map_pmap_enter() and vm_fault_prefault() is unnecessary overhead. To eliminate this overhead, this change modifies pmap_enter_quick() so that it expects the object to be locked on entry and it assumes the responsibility for busying the page and unlocking the object if it must sleep. Note: alpha, amd64, i386 and ia64 are the only implementations optimized by this change; arm, powerpc, and sparc64 still conservatively busy the page and unlock the object within every pmap_enter_quick() call. Additionally, this change is the first case where we synchronize access to the page's PG_BUSY flag and busy field using the containing object's lock rather than the global page queues lock. (Modifications to the page's PG_BUSY flag and busy field have asserted both locks for several weeks, enabling an incremental transition.)
# 138752	12-Dec-2004	marcel	Fix the last of the instability and the cause of the annoying "vm_fault: fault on nofault entry, addr: %lx" panic. The problem was a stale PTE in the TLB that marked the page as not present, even though we had a good PTE in the VHPT. We typically don't yet insert PTEs in the TLB. We do that lazily. The CPU will look for the PTE in the VHPT when there's no PTE in the TLB. Unfortunately this doesn't handle the case of the stale PTE in the TLB. The quick fix is to invalidate the TLB (sloppily) when the VHPT doesn't contain a valid PTE. This is also the only case that may cause a PTE in the TLB that marks a page as non-present.
# 137978	21-Nov-2004	marcel	Remove struct ia64_itir and use a plain old uint64_t instead.
# 136070	02-Oct-2004	alc	The physical address stored in the vm_page is page aligned. There is no need to mask off the page offset bits. (This operation made some sense prior to i386/i386/pmap.c revision 1.254 when we passed a physical address rather than a vm_page pointer to pmap_enter().)
# 136050	02-Oct-2004	alc	Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These uses predate the change in the pmap_enter() interface that replaced the page's physical address by the address of its vm_page structure. The PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page structure that was being passed in.
# 135590	22-Sep-2004	marcel	Redefine a PTE as a 64-bit integral type instead of a struct of bit-fields. Unify the PTE defines accordingly and update all uses.
# 135589	22-Sep-2004	marcel	s/u_int#_t/uint#_t/g
# 135443	18-Sep-2004	alc	Release the page queues lock earlier in pmap_protect() and pmap_remove() in order to reduce contention.
# 134509	30-Aug-2004	alc	Remove unnecessary check for curthread == NULL.
# 134393	27-Aug-2004	alc	The machine-independent parts of the virtual memory system always pass a valid pmap to the pmap functions that require one. Remove the checks for NULL. (These checks have their origins in the Mach pmap.c that was integrated into BSD. None of the new code written specifically for FreeBSD included them.)
# 133405	09-Aug-2004	marcel	Better preserve the original protection for the mappings we maintain. The hardware always gives read access for privilege level 0, which means that we cannot use the hardware access rights and privilege level in the PTE to test whether there's a change in protection. So, we save the original vm_prot_t in the PTE as well. Add pmap_pte_prot() to set the proper access rights and privilege level on the PTE given a pmap and the requested protection. The above allows us to compare the protection in pmap_extract_and_hold() which was missing. While in pmap_extract_and_hold(), add pmap locking. While here, clean up most (i.e. all but one) PTE macros we inherited from alpha. They were either unused, used inconsistently, badly named or simply weren't beneficial. We save the wired and managed state of the PTE in distinct (bit) fields. While in pte.h, s/u_int64_t/uint64_t/g pmap locking obtained from: alc@ feedback & review by: alc@
# 132897	30-Jul-2004	alc	- Add pmap locking to ia64's pmap_enter() and pmap_enter_quick(). (This brings ia64 to parity with alpha, amd64, and i386 in this area.) - Prevent a race in pmap_find_pte(): If pmap_find_pte() sleeps in uma_zalloc(), another thread could allocate a pte at the same address. Instead, sleep at a higher level and retry the lookup before retrying the allocation. Reviewed and tested by: marcel@
# 132522	22-Jul-2004	alc	In pmap_mincore() create a private copy of the pte for use after the pmap lock is released.
# 132487	21-Jul-2004	alc	Additional pmap locking Tested by: marcel@
# 132378	19-Jul-2004	alc	Add partial pmap locking. Tested by: marcel@
# 132235	16-Jul-2004	alc	Remove unused fields from the pmap.
# 132220	15-Jul-2004	alc	Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.
# 132170	15-Jul-2004	alc	A loop in pmap_remove() should use TAILQ_FOREACH_SAFE(), not TAILQ_FOREACH(), because the loop deletes elements from the list. Reviewed by: marcel@
# 132085	13-Jul-2004	alc	Simplify pmap_protect().
# 132082	13-Jul-2004	alc	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
# 131662	05-Jul-2004	alc	- Correct pmap_extract()'s return type. It should be vm_paddr_t, not vm_offset_t. - Convert pmap_extract() to the ANSI style of declaration.
# 130742	19-Jun-2004	alc	Remove dead code related to pv entry allocation. Reviewed by: marcel@
# 130362	11-Jun-2004	alc	Neither pmap_enter() nor pmap_enter_quick() should create pv entries for unmanaged pages. Tested by: marcel@
# 130338	11-Jun-2004	alc	Reduce the number of preallocated pv entries and lpte entries in pmap_init(). Tested by: marcel@
# 128105	11-Apr-2004	alc	Remove a comment that refers to avail_start and avail_end as these variables no longer exist.
# 128097	10-Apr-2004	alc	- pmap_kenter_temporary() is unused by machine-independent code. Therefore, move its declaration to the machine-dependent header file on those machines that use it. In principle, only i386 should have it. Alpha and AMD64 should use their direct virtual-to-physical mapping. - Remove pmap_kenter_temporary() from ia64. It is unused. Approved by: marcel@
# 127922	05-Apr-2004	alc	Remove avail_end. As of yesterday, it is unused.
# 127875	05-Apr-2004	alc	Remove avail_start on those platforms that no longer use it. (Only amd64 does anything with it beyond simple initialization.)
# 127869	04-Apr-2004	alc	Remove unused arguments from pmap_init().
# 126728	07-Mar-2004	alc	Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.
# 126716	07-Mar-2004	alc	Integrate the code from pmap_pinit2() into pmap_pinit(), leaving pmap_pinit2() empty. Approved by: marcel
# 120914	08-Oct-2003	marcel	Include <sys/smp.h> for the prototype of smp_rendezvous().
# 120722	03-Oct-2003	alc	Migrate pmap_prefault() into the machine-independent virtual memory layer. A small helper function pmap_is_prefaultable() is added. This function encapsulate the few lines of pmap_prefault() that actually vary from machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have much in common. Going forward, it's worth considering their merger.
# 120294	20-Sep-2003	marcel	Move uma_small_alloc() and uma_small_free() to uma_machdep.c. These functions reference UMA internals from <vm/uma_int.h>, which makes them highly unwanted in non-UMA specific files. While here, prune the includes in pmap.c and use __FBSDID(). Move the includes above the descriptive comment. The copyright of uma_machdep.c is assigned to the project and can be reassigned to the foundation if and when when such is preferrable.
# 119999	12-Sep-2003	alc	Add a new parameter to pmap_extract_and_hold() that is needed to eliminate Giant from vmapbuf(). Idea from: tegge
# 119906	09-Sep-2003	marcel	Introduce IA64_ID_PAGE_{MASK\|SHIFT\|SIZE} and LOG2_ID_PAGE_SIZE. The latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE. The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.
# 119869	08-Sep-2003	alc	Introduce a new pmap function, pmap_extract_and_hold(). This function atomically extracts and holds the physical page that is associated with the given pmap and virtual address. Such a function is needed to make the memory mapping optimizations used by, for example, pipes and raw disk I/O MP-safe. Reviewed by: tegge
# 119861	07-Sep-2003	alc	MFamd64/i386 Add necessary page locking to pmap_mincore().
# 118640	07-Aug-2003	marcel	MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().
# 118244	31-Jul-2003	bmilekic	Make sure that when the PV ENTRY zone is created in pmap, that it's created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In the i386 case in particular, the pmap code would hook a special page allocation routine that allocated from kernel_map and not kmem_map, and so when/if the pageout daemon drained the zones, it could actually push out slabs from the PV ENTRY zone but call UMA's default page_free, which resulted in pages allocated from kernel_map being freed to kmem_map; bad. kmem_free() ignores the return value of the vm_map_delete and just returns. I'm not sure what the exact repercussions could be, but it doesn't look good. In the PAE case on i386, we also set-up a zone in pmap, so be conservative for now and make that zone also ZONE_NOFREE and ZONE_VM. Do this for the pmap zones for the other archs too, although in some cases it may not be entirely necessarily. We'd rather be safe than sorry at this point. Perhaps all UMA_ZONE_VM zones should by default be also UMA_ZONE_NOFREE? May fix some of silby's crashes on the PV ENTRY zone.
# 118237	30-Jul-2003	peter	Remove leftover relic of pmap_new_thread() etc.
# 118024	25-Jul-2003	alc	MFi386 revision 1.416 Add vm object locking to pmap_prefault(). Note: powerpc and sparc64 do not implement this function.
# 117206	03-Jul-2003	alc	Background: pmap_object_init_pt() premaps the pages of a object in order to avoid the overhead of later page faults. In general, it implements two cases: one for vnode-backed objects and one for device-backed objects. Only the device-backed case is really machine-dependent, belonging in the pmap. This commit moves the vnode-backed case into the (relatively) new function vm_map_pmap_enter(). On amd64 and i386, this commit only amounts to code rearrangement. On alpha and ia64, the new machine independent (MI) implementation of the vnode case is smaller and more efficient than their pmap-based implementations. (The MI implementation takes advantage of the fact that objects in -CURRENT are ordered collections of pages.) On sparc64, pmap_object_init_pt() hadn't (yet) been implemented.
# 117045	29-Jun-2003	alc	- Export pmap_enter_quick() to the MI VM. This will permit the implementation of a largely MI pmap_object_init_pt() for vnode-backed objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64 and powerpc. - Correct a mismatch between pmap_object_init_pt()'s prototype and its various implementations. (I plan to keep pmap_object_init_pt() as the MD hook for device-backed objects on i386 and amd64.) - Correct an error in ia64's pmap_enter_quick() and adjust its interface to match the other versions. Discussed with: marcel
# 117022	29-Jun-2003	alc	- Remove the calls to pmap_install() from pmap_object_init_pt(); they are redundant. Discussed with: marcel - MFi386: Add vm object locking to pmap_object_init_pt().
# 116510	18-Jun-2003	alc	Fix a performance bug in all of the various implementations of uma_small_alloc(): They always zeroed the page regardless of what the caller requested.
# 116355	14-Jun-2003	alc	Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms. Two details: 1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0. 2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().
# 116328	14-Jun-2003	alc	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.
# 115937	07-Jun-2003	marcel	pmap_find_vhpt() has been observed to return a NULL pointer when the caller assumes this to not happen by means of performing an indirection without checking the return value. Add KASSERTs to force a kernel with INVARIANTS to panic. This is a short-term measure. The pmap code is scheduled to be overhauled.
# 115339	26-May-2003	marcel	Revision 1.99 of this file changed the allocation request from VM_ALLOC_INTERRUPT to VM_ALLOC_SYSTEM. There was no mention of this in commit log as it was considered harmless. Guess what: it does harm. WITNESS showed that we can not safely grab the page queue lock in vm_page_alloc() in all cases as we may have to sleep on it. Revert the request to VM_ALLOC_INTERRUPT to circumvent this. We panic if vm_page_alloc returns 0. I'm not entirely happy about this, but we have bigger fish to fry. Approved by: re@ (blanket)
# 115176	20-May-2003	marcel	Prevent corruption of the VHPT collision chain by protecting it with a mutex. The only volatile chain operations are insertion and deletion but since updating an existing PTE also updates the VHPT entry itself, and we have the VHPT mutex in both other cases, we also lock when we update an existing PTE even though no chain operation is involved. Note that we perform the insertion and deletion careful enough that we don't need to lock traversals. If we need to lock traversals, we also need to lock from the exception handler, which we can't without creating a trapframe. We're now able to withstand a -j8 buildworld. More work is needed to withstand Murphy fields. In other words: we still have a bogon... Approved by: re@ (blanket)
# 115152	19-May-2003	marcel	Turn pmap_install_pte() into a critical section. We better not get interrupted while writing into the VHPT table. While here, make sure memory accesses a properly ordered. Tag invalidation must happen first so that the hardware VHPT walker will not be able to match this entry while we're updating it and we have to make sure the new new tag gets written only after the PTE is completely updated. Approved by: re (blanket)
# 115149	19-May-2003	marcel	Unconditionally set pcb_current_pmap. WIP versions of the code previously committed cleared pcb_current_pmap prior to changing the region registers, but that was removed before committing. Since we don't normally (at all?) pass a NULL pointer, the bug was mostly harmless. Fix it while I'm here... I'm here because we need to have data serialization after writing to the region registers. Not doing so was likely the cause of the hangs we were experiencing. General exceptions in cpu_switch may also be caused by the lack of serialization. Approved by: re (blanket)
# 115148	19-May-2003	marcel	pmap_install() needs to be atomic WRT to context switching. Protect switching user regions (region 0-4) with schedlock. Avoid unnecessary recursion on schedlock by moving the core functionality to another function (pmap_switch()) where we assert schedlock is held. Turn pmap_install() into a wrapper that grabs schedlock. This minimizes the number of callsites that need to be changed. Since we already have schedlock in cpu_switch() and cpu_throw(), have them call pmap_switch() directly. These were also the only two calls to pmap_install() outside pmap.c, so make pmap_install() static and remove its prototype from pmap.h Approved by: re (blanket)
# 115084	16-May-2003	marcel	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)
# 115063	16-May-2003	marcel	o In pmap_install, don't prevent switching the pmap if we're switching to kernel_pmap. The pmap is not special enough. o Clear the active bit on the pmap we're switching out. o Fix some nearby style(9) bugs. Approved by: re@
# 115059	16-May-2003	marcel	Indent a comment. This makes 1.100. Still approved by: re@ (blanket)
# 115058	16-May-2003	marcel	Turn pmap_growkernel() into a critical section. While here, initialize kernel_vm_end in pmap_bootstrap. Don't delay the initialization until we need to grow the kernel VM space. This BTW happens twice before we enter either single- or multi-user mode. Don't adjust kernel_vm_end while growing based on whether the KPT contains a non-NULL entry. We trust kernel_vm_end to be correct and we make sure it's still correct after growing. Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.
# 115057	16-May-2003	marcel	Revamp the RID allocation code: o Limit the size of the region ID map to 64KB. This gives a bitmap that is large enough to keep track of 2^19 numbers. The minimal map size is 32KB. The reason we limit the map size is that processor models may have implemented a 24-bit region ID, which would give a 2MB bitmap while the maximum number of allocations is always less than PID_MAX*5, which is less than 2^19. o Allocate all region IDs up-front. The slight downside of reserving more RIDs then a process needs (3 for ia64 native and 1 for ia32) is preferable over the call to pmap_ensure_rid() where RIDs are allocated on demand. On SMP systems this may lead to a race condition. o When allocating a region ID, don't use arc4random(). We're not interested in randomness or uniform distribution across the spectrum. We only need uniqueness. Random numbers may easily collide when the number of allocated RIDs is high, creating a possibly unbounded retry rate.
# 115056	16-May-2003	marcel	Move the conditional definition of KSTACK_MAX_PAGES up ahead where it's more visible. Approved by: re@ (blanket)
# 113831	21-Apr-2003	marcel	Don't use the tpa instruction to implement pmap_kextract. The tpa instruction requires that a translation is present in the TC. This may trigger a TLB miss and a subsequent call to vm_fault(). This implementation is deliberately non-inline for debugging and profiling purposes. Partial or full inlining should eventually be done. Valuable insights by: jake
# 111462	25-Feb-2003	mux	Cleanup of the d_mmap_t interface. - Get rid of the useless atop() / pmap_phys_address() detour. The device mmap handlers must now give back the physical address without atop()'ing it. - Don't borrow the physical address of the mapping in the returned int. Now we properly pass a vm_offset_t * and expect it to be filled by the mmap handler when the mapping was successful. The mmap handler must now return 0 when successful, any other value is considered as an error. Previously, returning -1 was the only way to fail. This change thus accidentally fixes some devices which were bogusly returning errno constants which would have been considered as addresses by the device pager. - Garbage collect the poorly named pmap_phys_address() now that it's no longer used. - Convert all the d_mmap_t consumers to the new API. I'm still not sure wheter we need a __FreeBSD_version bump for this, since and we didn't guarantee API/ABI stability until 5.1-RELEASE. Discussed with: alc, phk, jake Reviewed by: peter Compile-tested on: LINT (i386), GENERIC (alpha and sparc64) Runtime-tested on: i386
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 110959	15-Feb-2003	marcel	Fix misuse of Maxmem in the calculation of the VHPT size. Maxmem is already in pages, so we should not convert from bytes to pages. The result of this bug was bad scaling of the VHPT relative to the available memory. Submitted by: Arun Sharma <arun@sharma-home.net>
# 110784	13-Feb-2003	alc	MFi386 Remove kptobj. Instead, use VM_ALLOC_NOOBJ.
# 110211	01-Feb-2003	marcel	Remove special casing for running in the simulator from the kernel and instead add platform, firmware and EFI stubs to the loader. The net effect of this change is that besides a special console and disk driver, the kernel has no knowledge of the simulator. This has the following advantages: o Simulator support is much harder to break, o It's easier to make use of more feature complete simulators. This would only need a change in the simulator specific loader, o Running SMP kernels within the simulator. Note that ski at this time does not simulate IPIs, so there's no way to start APs. The platform, firmware and EFI stubs describe the following hardware: o 4 CPU Itanium, o 128 MB RAM within the 4GB address space, o 64 MB RAM above the 4GB address space. NOTE: The stubs in the skiloader describe a machine that should in parts be defined by the simulator. Things like processor interrupt block and AP wakeup vector cannot be choosen at random because they require interpretation by the simulator. Currently the simulator is ignorant of this. This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS which is ignored by the simulator. Tested with: ski (version 0.943 for linux)
# 109799	24-Jan-2003	dfr	Fix pmap_extract so that it doesn't panic if the user types 'cat /proc/pid/map' Submitted by: Arun Sharma <arun.sharma@intel.com>
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 109615	21-Jan-2003	jeff	- Add a VM_WAIT in the appropriate cases where vm_page_alloc() fails and flags indicate that uma_small_alloc should not. This code should be refactored so that there is not so much cross arch duplication. Reviewed by: jake Spotted by: tmm Tested on: alpha, sparc64 Pointy hat to: jeff and everyone who cut and pasted the bad code. :-)
# 108643	04-Jan-2003	alc	Hold the page queues lock around pmap_remove_pte() in pmap_enter(). Submitted by: Arun Sharma <adsharma@unix-os.sc.intel.com>
# 107394	29-Nov-2002	marcel	Better handle sparse physical memory: Don't use the address range as a measure for available memory to scale the VHPT. Instead, use the previously determined Maxmem. Approved by: re (carte blanc)
# 107028	17-Nov-2002	alc	MFi386 r1.369 - Clear the PG_WRITEABLE flag in pmap_page_protect() if write access is being removed. Return immediately if write access is being removed and PG_WRITEABLE is already clear.
# 106838	13-Nov-2002	alc	Move pmap_collect() out of the machine-dependent code, rename it to reflect its new location, and add page queue and flag locking. Notes: (1) alpha, i386, and ia64 had identical implementations of pmap_collect() in terms of machine-independent interfaces; (2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
# 106753	11-Nov-2002	alc	- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit() if we're removing write access from the page's PTEs. - Export pmap_remove_all() on alpha, i386, and ia64. (It's already exported on sparc64.)
# 106486	06-Nov-2002	marcel	Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region 7 addresses for use by page tables and kernel stacks. Obtained from: peter
# 105001	12-Oct-2002	marcel	Polish previous commit: o Replace KSTACK_PAGES with pages on panic() in pmap_new_thread(), o Fix style bugs in adjacent code, o Use NULL instead of 0 for pointers, o Save the virtual kstack address if we create an alternate kstack because 1) we can derive the physical (RR7) address from it and 2) we need the virtual address for contigfree() in pmap_dispose_thread(). Thus td_altkstack saves td_md.md_kstackvirt.
# 104939	11-Oct-2002	peter	cut/paste the pmap_new_altkstack stuff from the other platforms. It's no different here. Update the rest of the kstack API's for scottl's changes.
# 104938	11-Oct-2002	peter	Call uma_zalloc on pvzone with M_NOWAIT, just like i386 and alpha. Otherwise we get hundreds of 'could sleep' during boot.
# 104320	01-Oct-2002	phk	Fix the same misinitialization of pmap_prefault_pageorder as on i386. Suggeste by: jake
# 102837	02-Sep-2002	alc	o Remove an initialized but unused variable from pmap_remove_all().
# 102663	31-Aug-2002	peter	Do not use an object for the pte and pv zones on ia64 because it overrides the pmap_allocf() function that we provide above. We still use the limits via other means. Submitted by: jeff
# 102399	25-Aug-2002	alc	o Retire pmap_pageable(). It's an advisory routine that none of our platforms implements.
# 101645	10-Aug-2002	alc	o Remove the setting and clearing of the PG_MAPPED flag from the alpha and ia64 pmap. o Remove the PG_MAPPED flag's declaration.
# 101199	02-Aug-2002	alc	o Lock page queue accesses by vm_page_deactivate().
# 100543	23-Jul-2002	arr	- Pass the VM_ALLOC_WIRED flag to vm_page_alloc() in pmap_growkernel() so that we can avoid a call to vm_page_lock_queues(). Approved by: peter
# 100269	17-Jul-2002	peter	Fix some typos in 1.68 from over a week ago.
# 100268	17-Jul-2002	peter	Cap the initial PV and PTE table preallocations. Otherwise we explode on the Itanium2 system I have when we use up all of the initial 256MB direct mapped region before we are ready to dynamically expand it. The machine that I have has 4 cpus and a very big hole in the middle. This makes the bogus '(last_address - first_address) / PAGE_SIZE' calculations especially dangerous and caused many millions of initial PV/PTE's to be preallocated.
# 100000	14-Jul-2002	alc	o Lock page queue accesses by vm_page_wire().
# 99571	08-Jul-2002	peter	Add a special page zero entry point intended to be called via the single threaded VM pagezero kthread outside of Giant. For some platforms, this is really easy since it can just use the direct mapped region. For others, IPI sending is involved or there are other issues, so grab Giant when needed. We still have preemption issues to deal with, but Alan Cox has an interesting suggestion on how to minimize the problem on x86. Use Luigi's hack for preserving the (lack of) priority. Turn the idle zeroing back on since it can now actually do something useful outside of Giant in many cases.
# 99559	07-Jul-2002	peter	Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still. While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.
# 99422	04-Jul-2002	peter	Back out proc part of last commit. UMA manages the thread cache only, and we just have to deal with the kstack when told to. We do not have a UMA-managed cache for the proc struct and its associated upage yet. So, go back to the old lazy mechanism. Note that if UMA destroys pages that used to contain proc structures, we'll lose the corresponding upage forever. (zones never did this - once a page was allocated, it stayed attached to the proc zone forever)
# 99421	04-Jul-2002	peter	Copy from sparc64/pmap.c rev 1.64 (Retrofit changes from i386/pmap.c rev 1.328-1.331.) but for uarea only. We still have our own broken kstack code here.
# 98773	24-Jun-2002	dfr	Add UMA_ZONE_VM flag to the zones which are used for pmap_enter().
# 98471	20-Jun-2002	peter	panic rather than fault and explode if we fail to contigmalloc a kernel stack. This is still bad(TM), but at least we have a clue when we get hit when contigmalloc fails.
# 98470	20-Jun-2002	peter	Use the canonical pmap_{new,dispose,swapin,swapout}_proc() functions, in this case cut/pasted from sparc64 instead of messing with contigmalloc where it is not needed.
# 96029	04-May-2002	dfr	Use region 7 addresses for the slabs in the PV and PT zones so that we don't confuse the zone allocater by translating region 5 addresses to region 7 addresses (which is unavoidable for PTEs).
# 96019	04-May-2002	marcel	Make sure we don't index the pm_rid array out of bounds in pmap_ensure_rid(). This can happen because the function is called for both user and kernel addresses, while the rid array only has room for user addresses. This bug got exposed by rev 1.58 of ia64/ia64/pmap.c and rev 1.8 of ia64/include/pmap.h.
# 95920	02-May-2002	marcel	In pmap_pinit0, remove duplicate initialization.
# 95710	29-Apr-2002	peter	Tidy up some loose ends. i386/ia64/alpha - catch up to sparc64/ppc: - replace pmap_kernel() with refs to kernel_pmap - change kernel_pmap pointer to (&kernel_pmap_store) (this is a speedup since ld can set these at compile/link time) all platforms (as suggested by jake): - gc unused pmap_reference - gc unused pmap_destroy - gc unused struct pmap.pm_count (we never used pm_count - we track address space sharing at the vmspace)
# 94779	15-Apr-2002	peter	Fix an "oops!" that turned out to be mostly harmless (but gave a warning). I did this right on the sparc64. Store the direct mapped addresses in the correct variables. Submitted by: jake
# 94777	15-Apr-2002	peter	Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.) Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
# 94364	10-Apr-2002	dfr	Initialise PCPU_GET(current_pmap) in pmap_bootstrap - cpu_switch needs to be sure that it is always correct and this was not true for the first call to cpu_switch. When thread0 resumed later, it ended up calling pmap_install with a null pmap, which is bad.
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 92870	21-Mar-2002	dfr	Change critical_t to register_t for intr_disable/restore.
# 92851	21-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API. Approved by: peter
# 92843	20-Mar-2002	alfred	Remove __P. Reviewd by: peter
# 92805	20-Mar-2002	dfr	Change intr_enable to intr_restore for consistency with sparc64.
# 92782	20-Mar-2002	dfr	Replace calls to cpu_critical_enter/exit with appropriate calls to either explicitly disable interrupts or use a real critical section, as appropriate.
# 92696	19-Mar-2002	peter	#if 0 some unused variables (only in #if 0 code)
# 92654	19-Mar-2002	jeff	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
# 92552	18-Mar-2002	dfr	Fix spelling.
# 92263	14-Mar-2002	dfr	* Use a mutex to protect the RID allocator. * Use ptc.g instead of ptc.l so that TLB shootdowns are broadcast to the coherence domain. * Use smp_rendezvous for pmap_invalidate_all to ensure it happens on all cpus. * Dike out a DIAGNOSTIC printf which didn't compile. * Protect the internals of pmap_install with cpu_critical_enter/exit.
# 91403	27-Feb-2002	silby	Fix a horribly suboptimal algorithm in the vm_daemon. In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times. Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change. Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began. Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals. PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week
# 90361	07-Feb-2002	julian	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# 88692	30-Dec-2001	marcel	Make vhpt_base and vhpt_size globals so that they can be used by the AP startup code.
# 88245	20-Dec-2001	peter	Replace a bunch of: for (pv = TAILQ_FIRST(&m->md.pv_list); pv; pv = TAILQ_NEXT(pv, pv_list)) { with: TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {
# 88088	17-Dec-2001	jhb	Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit(). Tested on: i386, alpha
# 87119	30-Nov-2001	dfr	* Don't use critical_enter/critical_exit when accessing the VHPT - its pointless and would be inadequate for SMP systems. We will rely on the VM system's locks to serialise this for now. * Change pmap_remove() so that if the range being removed is larger than the number of pages mapped by the pmap, we iterate over the currently mapped pages instead of over the virtual address range. This should make a difference when removing large virtual address ranges from an address space.
# 86443	16-Nov-2001	peter	Oops, I accidently merged a whitespace error from the original commit. (whitespace at end of line in rev 1.264 pmap.c). Fix them all.
# 86442	16-Nov-2001	peter	Merge rev 1.264 from i386/pmap.c (tegge via alfred): Protect against an infinite loop when prefaulting pages. This can happen when the vm system maps past the end of an object or tries to map a zero length object, the pmap layer misses the fact that offsets wrap into negative numbers and we get stuck.
# 86441	16-Nov-2001	peter	Merge rev 1.202 from i386/pmap.c (back in 1998 by John Dyson): Make flushing dirty pages work correctly on filesystems that unexpectedly do not complete writes even with sync I/O requests. This should help the behavior of mmaped files when using softupdates (and perhaps in other circumstances also.)
# 86440	16-Nov-2001	peter	Merge rev 1.293 of i386/pmap.c - skip PG_UNMANAGED in pmap_collect()
# 86438	16-Nov-2001	peter	Converge with i386/pmap.c - dont refer to curproc, use curthread.
# 86435	15-Nov-2001	peter	As part of a general cleanup and reconvergence of related pmap code, start tidying up some loose ends. The DEBUG_VA stuff has long since passed its use-by date. It wasn't used on ia64 but got cut/pasted there.
# 86213	09-Nov-2001	dfr	* Make sure we increment pm_stats.resident_count in pmap_enter_quick * Re-organise RID allocation so that we don't accidentally give a RID to two different processes. Also randomise the order to try to reduce collisions in VHPT and TLB. Don't allocate RIDs for regions which are unused. * Allocate space for VHPT based on the size of physical memory. More tuning is needed here. * Add sysctl instrumentation for VHPT - see sysctl vm.stats.vhpt * Fix a bug in pmap_prefault() which prevented it from actually adding pages to the pmap. * Remove ancient dead debugging code. * Add DDB commands for examining translation registers and region registers. The first change fixes the 'free/cache page %p was dirty' panic which I have been seeing when the system is put under moderate load. It also fixes the negative RSS values in ps which have been confusing me for a while. With this set of changes the ia64 port is reliable enough to build its own kernels, even with a 20-way parallel build. Next stop buildworld.
# 85930	02-Nov-2001	dillon	Implement i386/i386/pmap.c 1.292 for alpha, ia64 (avoid free page exhaustion / kernel panic for certain madvise() scenarios)
# 85856	02-Nov-2001	dfr	Remember to actually free the pv_entry in pmap_remove_entry().
# 85439	24-Oct-2001	dfr	* Clear the TLB on boot. * If a pte for a location given to pmap_enter_quick is valid, just give up - don't panic, even if the mapping is different.
# 85142	19-Oct-2001	dfr	Rework pmap so that it separates the PTE structure from the pv_entry structure. This makes it possible to pre-allocate PTEs for the kernel, which is necessary for a reliable implementation of pmap_kenter(). This also avoids wasting space (about 48 bytes per page) for kernel mappings and user mappings of memory-mapped devices. This also fixes a bug with the previous version where the implementation required the pv_entry structure to be physically contiguous but did not enforce this (the structure size was not a power of two). This meant that the pv_entry free list was quickly corrupted as soon as the system was even mildly loaded.
# 85024	16-Oct-2001	dfr	Size the number of pv_entries we use to bootstrap the pv_entry allocator based on the size of physical memory. This should eliminate the tweaking needed for larger memory configurations.
# 84555	05-Oct-2001	dfr	Use physical addresses, not virtual addresses when calling PHYS_TO_VM_PAGE.
# 84127	29-Sep-2001	dfr	* Read parameters for ptc.e instruction from PAL Code. * Add pmap_unmapdev().
# 83907	24-Sep-2001	dfr	Increase the number of bootstrap PVs.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 83197	07-Sep-2001	dfr	Typo in comment.
# 83196	07-Sep-2001	dfr	* Track ref/mod information properly when a mapping changes. * Fix a panic in pmap_remove() for a non-current pmap.
# 82632	31-Aug-2001	peter	Same as i386/i386/pmap.c: clean up some style. This is irrelevant since it is inside #if 0'ed code, but it would be a shame if this stuff got cut/pasted elsewhere.
# 80431	26-Jul-2001	peter	Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel for this, since it is easy to run into with large systems with lots of shared mmap space. Obtained from: yahoo
# 77448	29-May-2001	jhb	- Catch up to the VM mutex changes. - Sort includes in a few places.
# 75668	18-Apr-2001	dfr	Don't panic when we try to modify the kernel pmap.
# 74927	28-Mar-2001	jhb	Convert the allproc and proctree locks from lockmgr locks to sx locks.
# 74903	28-Mar-2001	jhb	Switch from save/disable/restore_intr() to critical_enter/exit().
# 73936	07-Mar-2001	jhb	Unrevert the pmap_map() changes. They weren't broken on x86. Sense beaten into me by: peter
# 73903	06-Mar-2001	jhb	Back out the pmap_map() change for now, it isn't completely stable on the i386.
# 73862	06-Mar-2001	jhb	- Rework pmap_map() to take advantage of direct-mapped segments on supported architectures such as the alpha. This allows us to save on kernel virtual address space, TLB entries, and (on the ia64) VHPT entries. pmap_map() now modifies the passed in virtual address on architectures that do not support direct-mapped segments to point to the next available virtual address. It also returns the actual address that the request was mapped to. - On the IA64 don't use a special zone of PV entries needed for early calls to pmap_kenter() during pmap_init(). This gets us in trouble because we end up trying to use the zone allocator before it is initialized. Instead, with the pmap_map() change, the number of needed PV entries is small enough that we can get by with a static pool that is used until pmap_init() is complete. Submitted by: dfr Debugging help: peter Tested by: me
# 71350	21-Jan-2001	des	First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant. Reviewed by: bmilekic, jasone
# 69947	12-Dec-2000	jake	- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead of explicit calls to lockmgr. Also provides macros for the flags pased to specify shared, exclusive or release which map to the lockmgr flags. This is so that the use of lockmgr can be easily replaced with optimized reader-writer locks. - Add some locking that I missed the first time.
# 69022	22-Nov-2000	jake	Protect the following with a lockmgr lock: allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
# 67247	17-Oct-2000	ps	Implement write combining for crashdumps. This is useful when write caching is disabled on both SCSI and IDE disks where large memory dumps could take up to an hour to complete. Taking an i386 scsi based system with 512MB of ram and timing (in seconds) how long it took to complete a dump, the following results were obtained: Before: After: WCE TIME WCE TIME ------------------ ------------------ 1 141.820972 1 15.600111 0 797.265072 0 65.480465 Obtained from: Yahoo! Reviewed by: peter
# 67213	16-Oct-2000	dfr	In pmap_remove_pv(), only manipulate the page's list if the pv is managed.
# 67017	12-Oct-2000	dfr	* Allocate kernel stacks with contigmalloc() to make exception handling safe - we can't afford to take a TLB trap when we are writing a trapframe. Possibly revisit this later. * Various fixes to pmap_enter() so that it actually works properly.
# 66937	10-Oct-2000	dfr	* Add rudimentary DDB support (no kgdb, no backtrace, no single step). * Track recent changes to SWI code. * Allocate RIDs for pmaps (untested). * Implement assembler version of cpu_switch - its cleaner that way.
# 66633	04-Oct-2000	dfr	Next round of fixes to the ia64 code. This includes simulated clock and disk drivers along with a load of fixes to context switching, fork handling and a load of other stuff I can't remember now. This takes us as far as start_init() before it dies. I guess now I will have to finish off the VM system and syscall handling :-).
# 66486	30-Sep-2000	dfr	Next round of ia64 work, including fixes to context switching, implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these changes, my test kernel reaches the mountroot prompt.
# 66463	29-Sep-2000	dfr	Implement dirty and access bit exceptions.
# 66458	29-Sep-2000	dfr	This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.