History log of /freebsd-current/sys/i386/i386/pmap.c
Revision Date Author Comments
# f1d73aac 02-Jun-2024 Alan Cox <alc@FreeBSD.org>

pmap: Skip some superpage promotion attempts that will fail

Implement a simple heuristic to skip pointless promotion attempts by
pmap_enter_quick_locked() and moea64_enter(). Specifically, when
vm_fault() calls pmap_enter_quick() to map neighboring pages at the end
of a copy-on-write fault, there is no point in attempting promotion in
pmap_enter_quick_locked() and moea64_enter(). Promotion will fail
because the base pages have differing protection.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45431
MFC after: 1 week


# deab5717 27-May-2024 Mitchell Horne <mhorne@FreeBSD.org>

Adjust comments referencing vm_mem_init()

I cannot find a time where the function was not named this.

Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383


# fd1aa5b3 02-Feb-2024 John Baldwin <jhb@FreeBSD.org>

x86: Consistently pass true/false to is_pde parameter of pmap_cache_bits

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D43692


# 1f1b2286 31-Jan-2024 John Baldwin <jhb@FreeBSD.org>

pmap: Convert boolean_t to bool.

Reviewed by: kib (older version)
Differential Revision: https://reviews.freebsd.org/D39921


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 02320f64 19-Oct-2023 Zhenlei Huang <zlei@FreeBSD.org>

pmap: Prefer consistent naming for loader tunable

The sysctl knob 'vm.pmap.pv_entry_max' becomes a loader tunable since
7ff48af7040f (Allow a specific setting for pv entries) but is fetched
from system environment 'vm.pmap.pv_entries'. That is inconsistent and
obscure.

This reverts 36e1b9702e21 (Correct the tunable name in the message).

PR: 231577
Reviewed by: jhibbits, alc, kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42274


# 6c1d6d4c 08-Oct-2023 Bojan Novković <bojan.novkovic@fer.hr>

i386: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping

Let pmap_enter_pde() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is a step towards reverting
commit 64087fd7f372.

Reviewed by: alc, kib, markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41635


# 902ed64f 24-Sep-2023 Alan Cox <alc@FreeBSD.org>

i386 pmap: Adapt recent amd64/arm64 superpage improvements

Don't recompute mpte during promotion.

Optimize MADV_WILLNEED on existing superpages.

Standardize promotion conditions across amd64, arm64, and i386.

Stop requiring the accessed bit for superpage promotion.

Tidy up pmap_promote_pde() calls.

Retire PMAP_INLINE. It's no longer used.

Note: Some of these changes are a prerequisite to fixing a panic that
arises when attempting to create a wired superpage mapping by
pmap_enter(psind=1) (as opposed to promotion).

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41944


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 3e04ae43 14-Jul-2023 Doug Moore <dougm@FreeBSD.org>

vm_radix_init: use initializer

Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.

Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D40971


# 934bfc12 17-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

Add vm_page_any_valid()

Use it and several other vm_page_*_valid() functions in more places.

Suggested and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37024


# 4d90a5af 07-Oct-2022 John Baldwin <jhb@FreeBSD.org>

sys: Consolidate common implementation details of PV entries.

Add a <sys/_pv_entry.h> intended for use in <machine/pmap.h> to
define struct pv_entry, pv_chunk, and related macros and inline
functions.

Note that powerpc does not yet use this as while the mmu_radix pmap
in powerpc uses the new scheme (albeit with fewer PV entries in a
chunk than normal due to an used pv_pmap field in struct pv_entry),
the Book-E pmaps for powerpc use the older style PV entries without
chunks (and thus require the pv_pmap field).

Suggested by: kib
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36685


# f49fd63a 22-Sep-2022 John Baldwin <jhb@FreeBSD.org>

kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.

Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36549


# 7ae99f80 22-Sep-2022 John Baldwin <jhb@FreeBSD.org>

pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.

This matches the return type of pmap_mapdev/bios.

Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548


# e6639073 23-Aug-2022 John Baldwin <jhb@FreeBSD.org>

Define _NPCM and the last PC_FREEn constant in terms of _NPCPV.

This applies one of the changes from
5567d6b4419b02a2099527228b1a51cc55a5b47d to other architectures
besides arm64.

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36263


# f5ad538d 18-Jul-2022 Mateusz Guzik <mjg@FreeBSD.org>

i386: fix pmap_trm_arena_last atomic load type

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9a22f7fb 09-Apr-2022 Gordon Bergling <gbe@FreeBSD.org>

i386: Remove a double word in a source code comment

- s/an an/an/

MFC after: 3 days


# c1480b17 08-Apr-2022 John Baldwin <jhb@FreeBSD.org>

i386 pmap: Re-quiet set but unused warnings.

__diagused no longer covers KTR, so use explicit #ifdef KTR instead.


# 3c942808 05-Jan-2022 Konstantin Belousov <kib@FreeBSD.org>

Silent some warnings for i386 kernel build

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e2650af1 29-Dec-2021 Stefan Eßer <se@FreeBSD.org>

Make CPU_SET macros compliant with other implementations

The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.

Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.

The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).

The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.

This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.

One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.

Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.

The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.

Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451


# ff93447d 19-Oct-2021 Mark Johnston <markj@FreeBSD.org>

Use the vm_radix_init() helper when initializing pmaps

No functional change intended.

Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32527


# a4667e09 19-Oct-2021 Mark Johnston <markj@FreeBSD.org>

Convert vm_page_alloc() callers to use vm_page_alloc_noobj().

Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986


# b092c58c 14-Jul-2021 Mark Johnston <markj@FreeBSD.org>

Assert that valid PTEs are not overwritten when installing a new PTP

amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.

Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31171


# 67932460 17-Feb-2021 John Baldwin <jhb@FreeBSD.org>

Add a VA_IS_CLEANMAP() macro.

This macro returns true if a provided virtual address is contained
in the kernel's clean submap.

In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28710


# 847ab36b 02-Sep-2020 Mark Johnston <markj@FreeBSD.org>

Include the psind in data returned by mincore(2).

Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.

The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.

For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.

Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238


# ed83a561 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

i386: clean up empty lines in .c and .h files


# 4ae224c6 16-Jul-2020 Conrad Meyer <cem@FreeBSD.org>

Revert r240317 to prevent leaking pmap entries

Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.

An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.

Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689


# 3b23ffe2 10-Jun-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64 pmap: reorder IPI send and local TLB flush in TLB invalidations.

Right now code first flushes all local TLB entries that needs to be
flushed, then signals IPI to remote cores, and then waits for
acknowledgements while spinning idle. In the VMWare article 'Don’t
shoot down TLB shootdowns!' it was noted that the time spent spinning
is lost, and can be more usefully used doing local TLB invalidation.

We could use the same invalidation handler for local TLB as for
remote, but typically for pmap == curpmap we can use INVLPG for locals
instead of INVPCID on remotes, since we cannot control context
switches on them. Due to that, keep the local code and provide the
callbacks to be called from smp_targeted_tlb_shootdown() after IPIs
are fired but before spin wait starts.

Reviewed by: alc, cem, markj, Anton Rang <rang at acm.org>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D25188


# 81302f1d 28-May-2020 Mark Johnston <markj@FreeBSD.org>

Fix boot on systems where NUMA domain 0 is unpopulated.

- Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to
ensure that segments registered during hammer_time() are placed in the
right domain. Otherwise, since the SRAT is not parsed at that point,
we just add them to domain 0, which may be incorrect and results in a
domain with only several MB worth of memory.
- Fix uma_startup1() to try allocating memory for zones from any domain.
If domain 0 is unpopulated, the allocation will simply fail, resulting
in a page fault slightly later during boot.
- Change _vm_phys_domain() to return -1 for addresses not covered by the
affinity table, and change vm_phys_early_alloc() to handle wildcard
domains. This is necessary on amd64, where the page array is dense
and pmap_page_array_startup() may allocate page table pages for
non-existent page frames.

Reported and tested by: Rafael Kitover <rkitover@gmail.com>
Reviewed by: cem (earlier version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25001


# 10c8fb47 04-Feb-2020 Ryan Libby <rlibby@FreeBSD.org>

uma: convert mbuf_jumbo_alloc to UMA_ZONE_CONTIG & tag others

Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default
contig allocator (a copy of mbuf_jumbo_alloc). Tag other zones which
require contiguous objects, even if they don't use the new default
contig allocator, so that uma knows about their constraints.

Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23238


# f3e982e7 07-Jan-2020 Mark Johnston <markj@FreeBSD.org>

Define a unified pmap structure for i386.

The overloading of struct pmap for PAE and non-PAE pmaps results in
three distinct layouts for the structure, which is embedded in
struct vmspace. This causes a large number of duplicate structure
definitions in the i386 kernel's CTF type graph.

Since most pmap fields are the same in the two pmaps, simply provide
side-by-side variants of the fields that are distinct, using fixed-size
types.

PR: 242689
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22896


# e8bbca1b 07-Jan-2020 Mark Johnston <markj@FreeBSD.org>

Consistently use pmap_t instead of struct pmap *.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 1c3a2410 04-Jan-2020 Alan Cox <alc@FreeBSD.org>

When a copy-on-write fault occurs, pmap_enter() is called on to replace the
mapping to the old read-only page with a mapping to the new read-write page.
To destroy the old mapping, pmap_enter() must destroy its page table and PV
entries and invalidate its TLB entry. This change simply invalidates that
TLB entry a little earlier, specifically, on amd64 and arm64, before the PV
list lock is held.

Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23027


# e1ab4060 28-Dec-2019 Alan Cox <alc@FreeBSD.org>

Correctly implement PMAP_ENTER_NOREPLACE in pmap_enter_{l2,pde}() on kernel
mappings.

Reduce code duplication by defining a function, pmap_abort_ptp(), for
handling a common error case.

Simplify error handling in pmap_enter_quick_locked().

Reviewed by: kib
Tested by: pho
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22890


# 5cff1f4d 10-Dec-2019 Mark Johnston <markj@FreeBSD.org>

Introduce vm_page_astate.

This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields. No functional change intended.

Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650


# 01cef4ca 16-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Remove page locking from pmap_mincore().

After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823


# 638f8678 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596


# 205be21d 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592


# b119329d 25-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Complete the removal of the "wire_count" field from struct vm_page.

Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.

Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768


# 66eb1d63 22-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

i386: reduce differences in source between PAE and non-PAE pmaps ...

by defining pg_nx as zero for non-PAE and correspondingly simplifying
some expressions.

Suggested and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21757


# b223a692 22-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

i386: implement sysctl vm.pmap.kernel_maps.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21739


# e8bcf696 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Revert r352406, which contained changes I didn't intend to commit.


# 41fd4b94 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Fix a couple of nits in r352110.

- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486


# c45cbc7a 02-Aug-2019 John Baldwin <jhb@FreeBSD.org>

Don't reset memory attributes when mapping physical addresses for ACPI.

Previously, AcpiOsMemory was using pmap_mapbios which would always map
the requested address Write-Back (WB). For several AMD Ryzen laptops,
the BIOS uses AcpiOsMemory to directly access the PCI MCFG region in
order to access PCI config registers. This has the side effect of
remapping the MCFG region in the direct map as WB instead of UC
hanging the laptops during boot.

On the one laptop I examined in detail, the _PIC global method used to
switch from 8259A PICs to I/O APICs uses a pair of PCI config space
registers at offset 0x84 in the device at 0:0:0 to as a pair of
address/data registers to access an indirect register in the chipset
and clear a single bit to switch modes.

To fix, alter the semantics of pmap_mapbios() such that it does not
modify the attributes of any existing mappings and instead uses the
existing attributes. If a new mapping is created, this new mapping
uses WB (the default memory attribute).

Special thanks to the gentleman whose name I don't have who brought
two affected laptops to the hacker lounge at BSDCan. Direct access to
the affected systems permitted finding the root cause within an hour
or so.

PR: 231760, 236899
Reviewed by: kib, alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20327


# 43ded0a3 30-Jul-2019 Alan Cox <alc@FreeBSD.org>

In pmap_advise(), when we encounter a superpage mapping, we first demote the
mapping and then destroy one of the 4 KB page mappings so that there is a
potential trigger for repromotion. Currently, we destroy the first 4 KB
page mapping that falls within the (current) superpage mapping or the
virtual address range [sva, eva). However, I have found empirically that
destroying the last 4 KB mapping produces slightly better results,
specifically, more promotions and fewer failed promotion attempts.
Accordingly, this revision changes pmap_advise() to destroy the last 4 KB
page mapping. It also replaces some nearby uses of boolean_t with bool.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D21115


# 5d18382b 25-Jul-2019 Alan Cox <alc@FreeBSD.org>

Simplify the handling of superpages in pmap_clear_modify(). Specifically,
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.

Deindent the nearby code.

Reviewed by: kib, markj
Tested by: pho (amd64, i386)
X-MFC after: r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision: https://reviews.freebsd.org/D21027


# 43184d8e 15-Jul-2019 Alan Cox <alc@FreeBSD.org>

Revert r349973. Upon further reflection, I realized that the comment
deleted by r349973 is still valid on i386. Restore it.

Discussed with: markj


# f1384063 13-Jul-2019 Alan Cox <alc@FreeBSD.org>

Remove a stale comment.

Reported by: markj
MFC after: 1 week


# eeacb3b0 08-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended. __FreeBSD_version is bumped.

Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247


# c134ef74 28-Jun-2019 Alan Cox <alc@FreeBSD.org>

When we protect PTEs (as opposed to PDEs), we only call vm_page_dirty()
when, in fact, we are write protecting the page and the PTE has PG_M set.
However, pmap_protect_pde() was always calling vm_page_dirty() when the PDE
has PG_M set. So, adding PG_NX to a writeable PDE could result in
unnecessary (but harmless) calls to vm_page_dirty().

Simplify the loop calling vm_page_dirty() in pmap_protect_pde().

Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20793


# fd2dae0a 08-Jun-2019 Alan Cox <alc@FreeBSD.org>

Implement an alternative solution to the amd64 and i386 pmap problem that we
previously addressed in r348246.

This pmap problem also exists on arm64 and riscv. However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv. In
particular, arm64 and riscv do not define a PG_PROMOTED flag in their level
2 PTEs. (A PG_PROMOTED flag makes no sense on arm64, where unlike x86 or
riscv we are required to break the old 4KB mappings before making the 2MB
mapping; and on riscv there are no unused bits in the PTE to define a
PG_PROMOTED flag.)

This commit implements an alternative solution that can be used on all four
architectures. Moreover, this solution has two other advantages. First, on
older AMD processors that required the Erratum 383 workaround, it is less
costly. Specifically, it avoids unnecessary calls to pmap_fill_ptp() on a
superpage demotion. Second, it enables the elimination of some calls to
pagezero() in pmap_kernel_remove_{l2,pde}().

In addition, remove a related stale comment from pmap_enter_{l2,pde}().

Reviewed by: kib, markj (an earlier version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20538


# 88ea538a 07-Jun-2019 Mark Johnston <markj@FreeBSD.org>

Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).

These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.

Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470


# d7bb6218 25-May-2019 Mark Johnston <markj@FreeBSD.org>

Remove pmap_pid_dump() from the i386 pmap.

It has not been compilable in a long time and doesn't seem very useful.

Suggested by: kib
MFC after: 1 week


# a9c7546a 24-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix a corner case in demotion of kernel mappings.

It is possible for the kernel mapping to be created with superpage by
directly installing pde using pmap_enter_2mpage() without filling the
corresponding page table page. This can happen e.g. if the range is
already backed by reservation and vm_fault_soft_fast() conditions are
satisfied, which was observed on the pipe_map.

In this case, demotion must fill the page obtained from the pmap
radix, same as if the page is newly allocated. Use PG_PROMOTED bit as
an indicator that the page is valid, instead of the wire count of the
page table page.

Since the PG_PROMOTED bit is set on pde when we leave TLB entries for
4k pages around, which in particular means that the ptes were filled,
it provides more correct indicator. Note that pmap_protect_pde()
clears PG_PROMOTED, which handles the case when protection was changed
on the superpage without adjusting ptes.

Reported by: pho
In collaboration with: alc
Tested by: alc, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20380


# 64087fd7 21-Mar-2019 Mark Johnston <markj@FreeBSD.org>

Disallow preemptive creation of wired superpage mappings.

There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped. If the application subsequently
faults on that region, the handler may attempt to create a superpage
mapping backed by the resident, wired pages. However, the pmap code
responsible for creating such a mapping (pmap_enter_pde() on i386
and amd64) does not ensure that a leaf page table page is available
if the superpage is later demoted; the demotion operation must therefore
perform a non-blocking page allocation and must unmap the entire
superpage if the allocation fails. The pmap layer ensures that this
can never happen for wired mappings, and so the case described above
breaks that invariant.

For now, simply ensure that the MI fault handler never attempts to
create a wired superpage except via promotion.

Reviewed by: kib
Reported by: syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19670


# bced332a 28-Feb-2019 Konstantin Belousov <kib@FreeBSD.org>

Invalidate cache for the PDPTE page when using PAE paging but PAT is
not supported.

According to SDM rev. 69 vol. 3, for PDPTE registers loads:
- when PAT is not supported, access to the PDPTE page is performed as
UC, see 4.9.1;
- when PAT is supported, the access is WB, see 4.9.2.

So potentially CPU might load stale memory as PDPTEs if both PAT and
self-snoop are not implemented. To be safe, add total local cache
flush to pmap_cold() before initial load of cr3, and flush PDPTE page
in pmap_pinit(), if PAT is not implemented.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19365


# d5f2c1e4 26-Feb-2019 Konstantin Belousov <kib@FreeBSD.org>

i386 PAE: avoid atomic for pte_store() where possible.

Instead carefully write upper word, and only than the lower word with
PG_V, for previously invalid ptes. It provides some measurable system
time saving on buildworld.

Reviewed by: markj
Tested by: pho
Measured by: bde (early version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D19226


# eb785fab 06-Feb-2019 Konstantin Belousov <kib@FreeBSD.org>

Port sysctl kern.elf32.read_exec from amd64 to i386.

Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments. This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.

Reported and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 9a527560 29-Jan-2019 Konstantin Belousov <kib@FreeBSD.org>

i386: Merge PAE and non-PAE pmaps into same kernel.

Effectively all i386 kernels now have two pmaps compiled in: one
managing PAE pagetables, and another non-PAE. The implementation is
selected at cold time depending on the CPU features. The vm_paddr_t is
always 64bit now. As result, nx bit can be used on all capable CPUs.

Option PAE only affects the bus_addr_t: it is still 32bit for non-PAE
configs, for drivers compatibility. Kernel layout, esp. max kernel
address, low memory PDEs and max user address (same as trampoline
start) are now same for PAE and for non-PAE regardless of the type of
page tables used.

Non-PAE kernel (when using PAE pagetables) can handle physical memory
up to 24G now, larger memory requires re-tuning the KVA consumers and
instead the code caps the maximum at 24G. Unfortunately, a lot of
drivers do not use busdma(9) properly so by default even 4G barrier is
not easy. There are two tunables added: hw.above4g_allow and
hw.above24g_allow, the first one is kept enabled for now to evaluate
the status on HEAD, second is only for dev use.

i386 now creates three freelists if there is any memory above 4G, to
allow proper bounce pages allocation. Also, VM_KMEM_SIZE_SCALE changed
from 3 to 1.

The PAE_TABLES kernel config option is retired.

In collaboarion with: pho
Discussed with: emaste
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D18894


# d3f40307 03-Jan-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix typo in r342710.

Noted by: lidl
MFC after: 3 days


# 9bfc7fa4 02-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Avoid setting PG_U unconditionally in pmap_enter_quick_locked().

This KPI may in principle be used to create kernel mappings, in which
case we certainly should not be setting PG_U. In any case, PG_U must be
set on all layers in the page tables to grant user mode access, and we
were only setting it on leaf entries. Thus, this change should have no
functional impact.

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# e68d4438 31-Dec-2018 Konstantin Belousov <kib@FreeBSD.org>

Update comments: paging is initialized in pmap_cold().

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 05d5652a 29-Dec-2018 Konstantin Belousov <kib@FreeBSD.org>

i386: Fix allocation of the KVA frame for pmap_quick_enter_page().

Due to the typo, it shared the frame with the CMAP1 transient mapping.

In collaboration with: pho
MFC after: 3 days
Sponsored by: The FreeBSD Foundation (kib)


# 36e1b970 01-Dec-2018 Konstantin Belousov <kib@FreeBSD.org>

Correct the tunable name in the message.

Submitted by: Andre Albsmeier <mail@fbsd.e4m.org>
PR: 231577
MFC after: 1 week


# 9978bd99 30-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Add malloc_domainset(9) and _domainset variants to other allocator KPIs.

Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain. The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted. Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with: gallatin, jeff
Reported and tested by: pho (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17418


# 36209a40 20-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Add an assertion to pmap_enter().

When modifying an existing managed mapping, we should find a PV entry
for the old mapping. Verify this.

Before r335784 this would have been implicitly tested by the fact that
we always freed the PV entry for the old mapping.

Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D17626


# cb4961ab 01-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Apply r339046 to i386.

Belatedly add a comment to the amd64 pmap explaining why we initialize
the kernel pmap's resident page count.

Reviewed by: alc, kib
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17377


# 5f11ee20 29-Sep-2018 Konstantin Belousov <kib@FreeBSD.org>

Fix UP build.

Reported by: tijl
Sponsored by: The FreeBSD Foundation
Approved by: re (rgrimes)


# d12c4465 19-Sep-2018 Konstantin Belousov <kib@FreeBSD.org>

Convert x86 cache invalidation functions to ifuncs.

This simplifies the runtime logic and reduces the number of
runtime-constant branches.

Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D16736


# f0165b1c 28-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.

Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by: royger
Reviewed by: alc, markj (previous version)
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Approved by: re (marius)
Differential revision: https://reviews.freebsd.org/D16881


# 60b74234 25-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Unify amd64 and i386 vmspace0 pmap activation.

Add pmap_activate_boot() for i386, move the invocation on APs from MD
init_secondary() to x86 init_secondary_tail().

Suggested by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16893


# 83a90bff 21-Aug-2018 Alan Cox <alc@FreeBSD.org>

Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by: kib, markj
Discussed with: jeff, re@
Differential Revision: https://reviews.freebsd.org/D16825


# e45b89d2 01-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Add pmap_is_valid_memattr(9).

Discussed with: alc
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15583


# 6c85795a 27-Jul-2018 Mark Johnston <markj@FreeBSD.org>

Fix handling of KVA in kmem_bootstrap_free().

Do not use vm_map_remove() to release KVA back to the system. Because
kernel map entries do not have an associated VM object, with r336030
the vm_map_remove() call will not update the kernel page tables. Avoid
relying on the vm_map layer and instead update the pmap and release KVA
to the kernel arena directly in kmem_bootstrap_free().

Because the pmap updates will generally result in superpage demotions,
modify pmap_init() to insert PTPs shadowed by superpage mappings into
the kernel pmap's radix tree.

While here, port r329171 to i386.

Reported by: alc
Reviewed by: alc, kib
X-MFC with: r336505
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16426


# db016164 20-Jul-2018 Alan Cox <alc@FreeBSD.org>

Annotate a parameter as unused.

X-MFC with: r336288


# 697be9a3 15-Jul-2018 Mark Johnston <markj@FreeBSD.org>

Restore the check for the page size extension after r332489.

Without this, the support for transparent superpage promotion on i386
was left disabled.

Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D16279


# 8c087371 14-Jul-2018 Alan Cox <alc@FreeBSD.org>

Add support for pmap_enter(..., psind=1) to the i386 pmap. In other words,
add support for explicitly requesting that pmap_enter() create a 2 or 4 MB
page mapping. (Essentially, this feature allows the machine-independent
layer to create superpage mappings preemptively, and not wait for automatic
promotion to occur.)

Export pmap_ps_enabled() to the machine-independent layer.

Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or
reclaim a PV entry when one is not available.

Refactor pmap_enter_pde() into two functions, one by the same name, that is
a general-purpose function for creating PDE PG_PS mappings, and another,
pmap_enter_4mpage(), that is used to prefault 2 or 4 MB read- and/or
execute-only mappings for execve(2), mmap(2), and shmat(2).

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D16246


# 76999163 10-Jul-2018 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary differences between i386's pmap_enter() and amd64's.
For example, fully construct the new PTE before entering the critical
section. This change is a stepping stone to psind == 1 support on i386.

Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D16188


# 717d5c0b 08-Jul-2018 Alan Cox <alc@FreeBSD.org>

Invalidate the mapping before updating its physical address.

Doing so ensures that all threads sharing the pmap have a consistent
view of the mapping. This fixes the problem described in the commit
log messages for r329254 without the overhead of an extra fault in the
common case. Once other pmap_enter() implementations are similarly
modified, the workaround added in r329254 can be removed, reducing the
overhead of CoW faults.

See also r335784 for amd64. The i386 implementation of pmap_enter()
already reused the PV entry from the old mapping.

Reviewed by: kib, markj
Tested by: pho
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D16133


# 945a6b31 05-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

Extend r335969 to superpages.

It is possible that a fictitious unmanaged userspace mapping of
superpage is created on x86, e.g. by pmap_object_init_pt(), with the
physical address outside the vm_page_array[] coverage.

Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16085


# a0ef97f6 05-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

Revert r335999 to re-commit with the correct error message.


# c59dfa63 05-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16085


# 81dac871 05-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16085


# 84a15fe7 04-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

In x86 pmap_extract_and_hold()s, handle the case of PHYS_TO_VM_PAGE()
returning NULL.

vm_fault_quick_hold_pages() can be legitimately called on userspace
mappings backed by fictitious pages created by unmanaged device and sg
pagers.

Note that other architectures pmap_extract_and_hold() might need
similar fix, but I postponed the examination.

Reported by: bde
Discussed with: alc
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D16085


# d05d616c 30-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Use pmap_pte_ufast() instead of pmap_pte() in pmap_extract(),
pmap_is_prefaultable() and pmap_incore(), pushing the number of
shootdown IPIs back to the 3/1 kernel.

Benchmarked by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation


# c5981f69 30-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Extract code for fast mapping of pte from pmap_extract_and_hold()
into the helper function pmap_pte_ufast().

Benchmarked by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 7883d57a 30-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Restore pmap_copy() for 4/4 i386 pmap.

Create yet another temporal pte mapping routine pmap_pte_quick3(),
which is the copy of the pmap_pte_quick() and relies on the
pvh_global_lock to protect the frame. It accounts into the same
counters as pmap_pte_quick(). It is needed since pmap_copy() uses
pmap_pte_quick() already, and since a user pmap is no longer current
pmap.

pmap_copy() still provides the advantage for real-world workloads
involving lot of forks where processes do not exec immediately.

Benchmarked by: bde
Sponsored by: The FreeBSD Foundation


# d94bd372 30-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Do use pmap_pte_quick() in pmap_enter_quick_locked().

Benchmarked by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 1095694a 30-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Avoid unneccessary TLB shootdowns in pmap_unwire_ptp() for user pmaps,
which no longer create recursive page table mappings.

Benchmarked by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation


# ded29bd9 25-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Optimize i386 pmap_extract_and_hold().

In particular, stop using pmap_pte() to read non-promoted pte while
walking the page table. pmap_pte() needs to shoot down the kernel
mapping globally which causes IPI broadcast. Since
pmap_extract_and_hold() is used for slow copyin(9), it is very
significant hit for the 4/4 kernels.

Instead, create single purpose per-processor page frame and use it to
locally map page table page inside the critical section, to avoid
reuse of the frame by other thread if context switched.

Measurement demostrated very significant improvements in any load that
utilizes copyin/copyout.

Found and benchmarked by: bde
Sponsored by: The FreeBSD Foundation


# 507e50d5 12-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Initialize tramp_idleptd during cold pmap startup, before the
exception code is copied to the trampoline.

The correct value is then copied to trampoline automatically, so
tramp_idleptd_reloced can be eliminated.

This will allow to use the same exception entry code to handle traps
from vm86 bios calls on early boot stage, as after the trampoline is
configured.

Sponsored by: The FreeBSD Foundation


# 919015a4 18-Apr-2018 Konstantin Belousov <kib@FreeBSD.org>

Fix pmap_trm_alloc(M_ZERO).

Sponsored by: The FreeBSD Foundation


# d86c1f0d 13-Apr-2018 Konstantin Belousov <kib@FreeBSD.org>

i386 4/4G split.

The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap. The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline. If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done. The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging. I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14633


# 8c8ee2ee 04-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Unify bulk free operations in several pmaps.

Submitted by: Yoshihiro Ota
Reviewed by: markj
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D13485


# 2c0f13aa 20-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

vm_wait() rework.

Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384


# 5bd01497 14-Feb-2018 Conrad Meyer <cem@FreeBSD.org>

x86 pmap: Make memory mapped via pmap_qenter() non-executable

The idea is, the pmap_qenter() API is now defined to not produce executable
mappings. If you need executable mappings, use another API.

Add pg_nx flag in pmap_qenter on x86 to make kernel pages non-executable.

Other architectures that support execute-specific permissons on page table
entries should subsequently be updated to match.

Submitted by: Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by: markj
Discussed with: alc, jhb, kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14062


# e958ad4c 12-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273


# ab7c09f1 08-Feb-2018 Mark Johnston <markj@FreeBSD.org>

Use vm_page_unwire_noq() instead of directly modifying page wire counts.

No functional change intended.

Reviewed by: alc, kib (previous revision)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14266


# c8f9c1f3 27-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

Use PCID to optimize PTI.

Use PCID to avoid complete TLB shootdown when switching between user
and kernel mode with PTI enabled.

I use the model close to what I read about KAISER, user-mode PCID has
1:1 correspondence to the kernel-mode PCID, by setting bit 11 in PCID.
Full kernel-mode TLB shootdown is performed on context switches, since
KVA TLB invalidation only works in the current pmap. User-mode part of
TLB is flushed on the pmap activations as well.

Similarly, IPI TLB shootdowns must handle both kernel and user address
spaces for each address. Note that machines which implement PCID but
do not have INVPCID instructions, cause the usual complications in the
IPI handlers, due to the need to switch to the target PCID temporary.
This is racy, but because for PCID/no-INVPCID we disable the
interrupts in pmap_activate_sw(), IPI handler cannot see inconsistent
state of CPU PCID vs PCPU pmap/kcr3/ucr3 pointers.

On the other hand, on kernel/user switches, CR3_PCID_SAVE bit is set
and we do not clear TLB.

I can imagine alternative use of PCID, where there is only one PCID
allocated for the kernel pmap. Then, there is no need to shootdown
kernel TLB entries on context switch. But copyout(3) would need to
either use method similar to proc_rwmem() to access the userspace
data, or (in reverse) provide a temporal mapping for the kernel buffer
into user mode PCID and use trampoline for copy.

Reviewed by: markj (previous version)
Tested by: pho
Discussed with: alc (some aspects)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D13985


# bd50262f 17-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

PTI for amd64.

The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability. PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.

The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:

kernel text (but not modules text)
PCPU
GDT/IDT/user LDT/task structures
IST stacks for NMI and doublefault handlers.

Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords. Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.

IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.

On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed. The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame. Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().

Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps. They are registered using
pmap_pti_add_kva(). Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use. In principle, they can be installed into kernel page table per
pmap with some work. Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.

PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation. I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.

The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case. Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled. PCID can be forced on only for developer's benefit.

MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal. The fix is pending.

Reviewed by: markj (partially)
Tested by: pho (previous version)
Discussed with: jeff, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# ab3185d1 12-Jan-2018 Jeff Roberson <jeff@FreeBSD.org>

Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants. UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise. It handles
iteration for round-robin directly. The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed. Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by: Attilio (a slightly older version)
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon


# d5d56007 18-Dec-2017 Bruce Evans <bde@FreeBSD.org>

Also forgotten in the previous that removed the permanent double mapping
of low physical memory:

Update the comment about leaving the permanent mapping in place. This
also improves the wording of the comment. PTD 0 is still left alone
because it is fairly important that it was unmapped earlier, and the
comment now describes the unmapping of the other low PTDs that the code
actually does.

Reviewed by: kib


# 3f21dc29 18-Dec-2017 Bruce Evans <bde@FreeBSD.org>

Fix the undersupported option KERNLOAD, part 1: fix crashes in locore
when KERNLOAD is not a multiple of NBPDR (not the default) and PSE is
enabled (the default if the CPU supports it). Addresses in PDEs must
be a multiple of NBPDR in the PSE case, but were not so in the crashing
case.

KERNLOAD defaults to NBPDR. NBPDR is 4 MB for !PAE and 2 MB for PAE.
The default can be changed by editing i386/include/vmparam.h or using
makeoptions. It can be changed to less than NBPDR to save real and
virtual memory at a small cost in time, or to more than NBPDR to waste
real and virtual memory. It must be larger than 1 MB and a multiple of
PAGE_SIZE. When it is less than NBPDR, it is necessarily not a multiple
of NBPDR. This case has much larger bugs which will be fixed in part 2.

The fix is to only use PSE for physical addresses above <KERNLOAD
rounded _up_ to an NBPDR boundary>. When the rounding is non-null,
this leaves part of the kernel not using large pages. Rounding down
would avoid this pessimization, but would break setting of PAT bits
on i/o pages if it goes below 1MB. Since rounding down always goes
below 1MB when KERNLOAD < NBPDR and the KERNLOAD > NBPDR case is not
useful, never round down.

Fix related style bugs (e.g., wrong literal values for NBPDR in comments).

Reviewed by: kib


# fb3cc1c3 07-Dec-2017 Bruce Evans <bde@FreeBSD.org>

Move instantiation of msgbufp from 9 MD files to subr_prf.c.

This variable should be pure MI except possibly for reading it in MD
dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998. There were many imperfections in
r36441. This commit fixes only a small one, to simplify fixing the
others 1 arch at a time. (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)


# df57947f 18-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

spdx: initial adoption of licensing ID tags.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133


# 4e421792 16-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Remove i386 XBOX support.

It is for console presented at 2001 and featuring Pentium III
processor. Even if any of them are still alive and run FreeBSD, we do
not have any sign of life from their users. While removing another
dozens of #ifdefs from the i386 sources reduces the aversion from
looking at the code and improves the platform vitality.

Reviewed by: cem, pfg, rink (XBOX support author)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D13016


# 5fca1d90 23-Oct-2017 Mark Johnston <markj@FreeBSD.org>

Fix the VM_NRESERVLEVEL == 0 build.

Add VM_NRESERVLEVEL guards in the pmaps that implement transparent
superpage promotion using reservations.

Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12764


# 2375aaa8 31-Jul-2017 Mark Johnston <markj@FreeBSD.org>

Batch updates to v_wire_count when freeing page table pages on x86.

The removed release stores are not needed since stores are totally
ordered on i386 and amd64.

Reviewed by: alc, kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11790


# 510cdf22 01-Jul-2017 Alan Cox <alc@FreeBSD.org>

When "force" is specified to pmap_invalidate_cache_range(), the given
start address is not required to be page aligned. However, the loop
within pmap_invalidate_cache_range() that performs the actual cache
line invalidations requires that the starting address be truncated to
a multiple of the cache line size. This change corrects an error in
that truncation.

Submitted by: Brett Gutstein <bgutstein@rice.edu>
Reviewed by: kib
MFC after: 1 week


# 67d955aa 08-Apr-2017 Patrick Kelsey <pkelsey@FreeBSD.org>

Corrected misspelled versions of rendezvous.

The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().

Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313


# 03149668 26-Feb-2017 Alan Cox <alc@FreeBSD.org>

Refine the fix from r312954. Specifically, add a new PDE-only flag,
PG_PROMOTED, that indicates whether lingering 4KB page mappings might
need to be flushed on a PDE change that restricts or destroys a 2MB
page mapping. This flag allows the pmap to avoid range invalidations
that are both unnecessary and costly.

Reviewed by: kib, markj
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D9665


# dab48644 18-Feb-2017 Konstantin Belousov <kib@FreeBSD.org>

MFamd64 r313933: microoptimize pmap_protect_pde().

Noted by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 57f6622f 02-Feb-2017 Konstantin Belousov <kib@FreeBSD.org>

For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE
and device npx.

This means that FPU is always initialized and handled when available,
and SSE+ register file and exception are handled when available. This
makes the kernel FPU code much easier to maintain by the cost of
slight bloat for CPUs older than 25 years.

CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment
explaining the original purpose.

Suggested by and discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# a0f64f38 29-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not leave stale 4K TLB entries on pde (superpage) removal or
protection change.

On superpage promotion, x86 pmaps do not invalidate existing 4K
entries for the superpage range, because they are compatible with the
promoted 2/4M entry. But the invalidation on superpage removal or
protection change only did single INVLPG with the base address of the
superpage. This reliably flushed superpage TLB entry, and 4K entry
for the first page of the superpage, potentially leaving other 4K TLB
entries lingering. Do the invalidation of the whole superpage range
to correct the problem.

Note that the precise invalidation is done by x86 code for kernel_pmap
only, for user pmaps whole (per-AS) TLB is flushed. This made the bug
well hidden, because promotions of the kernel mappings require
specific load.

Reported and tested by: Jonathan Looney <jtl@netflix.com> (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# d9864508 29-Jan-2017 Jason A. Harmening <jah@FreeBSD.org>

Implement get_pcpu() for i386 and use it to replace pcpu_find(curcpu)
in the i386 pmap.

The curcpu macro loads the per-cpu data pointer as its first step,
so the remaining steps of pcpu_find(curcpu) are circular.

get_pcpu() is already implemented for arm, arm64, and risc-v.
My plan is to implement it for the remaining architectures and use
it to replace several instances of pcpu_find(curcpu) in MI code.

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9370


# 5611aaa1 20-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Use SFENCE for ordering CLFLUSHOPT.

SDM states that CLFLUSHOPT instructions can be ordered with other
writes by SFENCE, heavier MFENCE is not required.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 86785b54 14-Jan-2017 Jason A. Harmening <jah@FreeBSD.org>

Add comment explaining relative order of sched_unpin() and mtx_unlock().

Suggested by: alc
MFC after: 1 week


# 28699efd 14-Jan-2017 Jason A. Harmening <jah@FreeBSD.org>

For i386 temporary mappings, unpin the thread before releasing
the cmap lock. Releasing the lock first may result in the thread
being immediately rescheduled and bound to the same CPU, only to
unpin itself upon resuming execution.

Noted by: skra (in review for armv6 equivalent)
MFC after: 1 week


# bd7abab0 10-Jan-2017 Mark Johnston <markj@FreeBSD.org>

Coalesce TLB shootdowns of global PTEs in pmap_advise() on x86.

We would previously invalidate such entries individually, resulting in more
IPIs than necessary.

Reviewed by: alc, kib
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D9094


# 43aabbef 23-Dec-2016 Jason A. Harmening <jah@FreeBSD.org>

Move the objects used to create temporary mappings for i386 pmap zero and copy
operations to the MD PCPU region. Change sysmap initialization to only
allocate KVA pages for CPUs that are actually present. As a minor
optimization, this also prevents false sharing between adjacent sysmap objects
since the pcpu struct is already cacheline-aligned.

While here, move pc_qmap_addr initialization for the BSP into
pmap_bootstrap(), which allows use of pmap_quick* functions during early boot.

Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D8833


# e94965d8 07-Dec-2016 Alan Cox <alc@FreeBSD.org>

Previously, vm_radix_remove() would panic if the radix trie didn't
contain a vm_page_t at the specified index. However, with this
change, vm_radix_remove() no longer panics. Instead, it returns NULL
if there is no vm_page_t at the specified index. Otherwise, it
returns the vm_page_t. The motivation for this change is that it
simplifies the use of radix tries in the amd64, arm64, and i386 pmap
implementations. Instead of performing a lookup before every remove,
the pmap can simply perform the remove.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D8708


# d3e4d71f 28-Oct-2016 Konstantin Belousov <kib@FreeBSD.org>

Handle pmap_enter() over an existing 4/2M page in KVA on i386.

The userspace case was already handled by pmap_allocpte(). For kernel
VA, page table page must exist, and demote cannot fail, so we need to
just call pmap_demote_pde(). Also note that due to the machine AS
layout, promotions in the KVA on i386 are highly unlikely, so this
change is mostly for completeness.

Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D8323


# 8cb0c102 10-Sep-2016 Alan Cox <alc@FreeBSD.org>

Various changes to pmap_ts_referenced()

Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.

Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.

On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.

Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7836


# dbbaf04f 03-Sep-2016 Mark Johnston <markj@FreeBSD.org>

Remove support for idle page zeroing.

Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714


# 53aadae6 01-Sep-2016 Alan Cox <alc@FreeBSD.org>

As an optimization to the machine-independent layer, change the machine-
dependent pmap_ts_referenced() so that it updates the page's dirty field
if a modified bit is found while counting reference bits. This
opportunistic update can be performed at low cost and can eliminate the
need for some future calls to pmap_is_modified() by the machine-
independent layer.

Reviewed by: kib, markj
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7722


# ef209971 29-Aug-2016 Bruce Evans <bde@FreeBSD.org>

Shorten banal comments about zeroing and copying pages. Don't give
implementation details that last echoed the code 15-20 years ago.
But add a detail about pagezero() on i386. Switch from Mach style
to BSD style.


# fdb6320d 13-Jul-2016 Eric Badger <badger@FreeBSD.org>

Add explicit detection of KVM hypervisor

Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use
vm_guest in conditionals testing for KVM.

Also, fix a conditional checking if we're running in a VM which caught only
the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted
by: vangyzen).

Differential revision: https://reviews.freebsd.org/D7172
Sponsored by: Dell Inc.
Approved by: kib (mentor), vangyzen (mentor)
Reviewed by: alc
MFC after: 4 weeks


# a3269b08 14-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

x86: for pointers replace 0 with NULL.

These are mostly cosmetical, no functional change.

Found with devel/coccinelle.


# 10386b56 06-Dec-2015 Conrad Meyer <cem@FreeBSD.org>

pmap_invalidate_range: For very large ranges, flush the whole TLB

Typical TLBs have 40-512 entries available. At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.

Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.

Reviewed by: alc
Feedback from: jhb, kib
MFC notes: Depends on r291688
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4280


# 1ac62762 05-Dec-2015 Konstantin Belousov <kib@FreeBSD.org>

In the pmap_set_pg() function, which enables the global bit on the
ptes mapping the kernel on CPUs where global TLB entries are
supported, revert to flushing only non-global entries, i.e. to the
pre-r291688 state. There is no need to flush global TLB entries,
since only global entries created during the previous iterations of
the loop could exist at this moment.

Submitted by: alc
Differential revision: https://reviews.freebsd.org/D4368


# 27691a24 03-Dec-2015 Konstantin Belousov <kib@FreeBSD.org>

For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*]. This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU. Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386. Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical. Merge the code under x86/.

Noted by: jhb [*]
Reviewed by: cem, jhb, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4346


# 8e9ef12d 27-Oct-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Build fix for i386/XBOX and pc98/GENERIC.

Reviewed by: kib


# af95bbf5 24-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Intel SDM before revision 56 described the CLFLUSH instruction as only
ordered with the MFENCE instruction. Similar weak guarantees are also
specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods
pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced
CLFLUSH loop with MFENCE both before and after the loop.

In the revision 56 of SDM, Intel stated that all existing
implementations of CLFLUSH are strict, CLFLUSH instructions execution
is ordered WRT other CLFLUSH and writes. Also, the strict behaviour
is made architectural.

A new instruction CLFLUSHOPT (which was documented for some time in
the Instruction Set Extensions Programming Reference) provides the
weak behaviour which was previously attributed to CLFLUSH.

Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do
not execute MFENCE before and after the flushing loop.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation


# 9f86aba6 26-Sep-2015 Alan Cox <alc@FreeBSD.org>

Exploit r288122 to address a cosmetic issue. Since PV chunk pages don't
belong to a vm object, they can't be paged out. Since they can't be paged
out, they are never enqueued in a paging queue. Nonetheless, passing
PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages
are being enqueued in the inactive queue. As of r288122, we can avoid
this false impression by passing PQ_NONE.

Submitted by: kmacy (an earlier version)
Differential Revision: https://reviews.freebsd.org/D1674


# 7ef5e8bc 12-Aug-2015 Marcel Moolenaar <marcel@FreeBSD.org>

Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.

1. Delay calling cninit() until after pmap_bootstrap(). This makes
sure we have PMAP initialized enough to add translations. Keep
kdb_init() after cninit() so that we have console when we need
to break into the debugger on boot.
2. Unfortunately, the ATPIC code had be moved as well so as to
avoid a spurious trap #30. The reason for which is not known
at this time.
3. In pmap_mapdev_attr(), when we need to map a device prior to the
VM system being initialized, use virtual_avail as the KVA to map
the device at. In particular, avoid using the direct map on amd64
because we can't demote by virtue of not being able to allocate
yet. Keep track of the translation.
Re-use the translation after the VM has been initialized to not
waste KVA and to satisfy the assumption in uart(4) that the handle
returned for the low-level console is the same as later returned
when the device is probed and attached.
4. In pmap_unmapdev() remove the mapping from the table when called
pre-init. Otherwise keep the mapping. During bus probe and attach
device resources are mapped and unmapped multiple times, which
would have us destroy the mapping used by the low-level console.
5. In pmap_init(), set pmap_initialized to signal that we're not
pre-init anymore. On amd64, bring the direct map in sync with the
translations created at that time.
6. Implement bus_space_map() and bus_space_unmap() for real: when
the tag corresponds to memory space, call the corresponding
pmap_mapdev() and pmap_unmapdev() functions to construct and
actual handle.
7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply
call pmap_mapdev_attr() or bus_space_map() as desired.

Notes:
1. uart(4) already used bus_space_map() during low-level console
setup but since serial ports have traditionally been I/O port
based, the lack of a proper implementation for said function
was not a problem. It has always supported memory mapped UARTs
for low-level consoles by setting hw.uart.console accordingly.
2. The use of the direct map on amd64 without setting caching
attributes has been a bigger problem than previously thought.
This change has the fortunate (and unexpected) side-effect of
fixing various EFI frame buffer problems (though not all).

PR: 191564, 194952

Special thanks to:
1. XipLink, Inc -- generously donated an Intel Bay Trail E3800
based eval board (ADLE3800PC).
2. The FreeBSD Foundation, in particular emaste@ -- for UEFI
support in general and testing.
3. Everyone who tested the proposed for PR 191564.
4. jhb@ and kib@ for being a soundboard and applying a clue bat
if so needed.


# c8fbdcc1 05-Aug-2015 Konstantin Belousov <kib@FreeBSD.org>

Fix UP build after r286296, ensure that CPU_FOREACH() is defined.

Sponsored by: The FreeBSD Foundation


# 713841af 04-Aug-2015 Jason A. Harmening <jah@FreeBSD.org>

Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013


# 79855a57 29-Jul-2015 Sean Bruno <sbruno@FreeBSD.org>

Remove dead functions pmap_pvdump and pads.

Differential Revision: D3206
Submitted by: kevin.bowling@kev009.com
Reviewed by: alc


# 65a9768f 09-Jun-2015 Alan Cox <alc@FreeBSD.org>

Account for superpage mappings that are created by pmap_copy().


# 1c8e7232 18-Apr-2015 Konstantin Belousov <kib@FreeBSD.org>

Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by: alc
Discussed with: Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 34c15db9 13-Apr-2015 Konstantin Belousov <kib@FreeBSD.org>

Add config option PAE_TABLES for the i386 kernel. It switches pmap to
use PAE format for the page tables, but does not incur other
consequences of the full PAE config. In particular, vm_paddr_t and
bus_addr_t are left 32bit, and max supported memory is still limited
by 4GB.

The option allows to have nx permissions for memory mappings on i386
kernel, while keeping the usual i386 KBI and avoiding the kernel data
sizing problems typical for the PAE config.

Intel documented that the PAE format for page tables is available
starting with the Pentium Pro, but it is possible that the plain
Pentium CPUs have the required support (Appendix H). The goal is to
enable the option and non-exec mappings on i386 for the GENERIC
kernel. Anybody wanting a useful system on 486, have to reconfigure
the modern i386 kernel anyway.

Discussed with: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# f2c2231e 31-Mar-2015 Ryan Stone <rstone@FreeBSD.org>

Fix integer truncation bug in malloc(9)

A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision: https://reviews.freebsd.org/D2106
Reviewed by: kib
Reported by: Michael Fuckner <michael@fuckner.net>
Tested by: Michael Fuckner <michael@fuckner.net>
MFC after: 2 weeks


# 271f0f12 15-Nov-2014 Alan Cox <alc@FreeBSD.org>

Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default
on i386 PAE. Previously, VM_PHYSSEG_SPARSE could not be used on amd64 and
i386 because vm_page_startup() would not create vm_page structures for the
kernel page table pages allocated during pmap_bootstrap() but those vm_page
structures are needed when the kernel attempts to promote the corresponding
kernel virtual addresses to superpage mappings. To address this problem, a
new public function, vm_phys_add_seg(), is introduced and vm_phys_init() is
updated to reflect the creation of vm_phys_seg structures by calls to
vm_phys_add_seg().

Discussed with: Svatopluk Kraus
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division


# d6e53ebe 26-Oct-2014 Alan Cox <alc@FreeBSD.org>

By the time that pmap_init() runs, vm_phys_segs[] has been initialized. Obtaining
the end of memory address from vm_phys_segs[] is a little easier than obtaining it
from phys_avail[].

Discussed with: Svatopluk Kraus


# 07a92f34 08-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature. Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.

Reviewed by: alc (previous version)
Tested by: pho (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 490356e5 17-Sep-2014 Konstantin Belousov <kib@FreeBSD.org>

Presence of any VM_PROT bits in the permission argument on x86 implies
that the entry is readable and valid.

Reported by: markj
Submitted by: alc
Tested by: pho (previous version), markj
MFC after: 3 days


# 5e1d15a8 16-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Complete r254667, do not destroy pmap lock if KVA allocation failed.

Submitted by: Svatopluk Kraus <onwahe@gmail.com>
MFC after: 1 week


# 827a661d 11-Aug-2014 Alan Cox <alc@FreeBSD.org>

Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP
instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table
page allocation. (The same functions in the i386/xen and mips pmap
implementations already use PMAP_ENTER_NOSLEEP.)

X-MFC with: r269728
Sponsored by: EMC / Isilon Storage Division


# 39ffa8c1 08-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Change pmap_enter(9) interface to take flags parameter and superpage
mapping size (currently unused). The flags includes the fault access
bits, wired flag as PMAP_ENTER_WIRED, and a new flag
PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep.

For powerpc aim both 32 and 64 bit, fix implementation to ensure that
the requested mapping is created when PMAP_ENTER_NOSLEEP is not
specified, in particular, wait for the available memory required to
proceed.

In collaboration with: alc
Tested by: nwhitehorn (ppc aim32 and booke)
Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division
MFC after: 2 weeks


# a695d9b2 03-Aug-2014 Alan Cox <alc@FreeBSD.org>

Retire pmap_change_wiring(). We have never used it to wire virtual pages.
We continue to use pmap_enter() for that. For unwiring virtual pages, we
now use pmap_unwire(), which unwires a range of virtual addresses instead
of a single virtual page.

Sponsored by: EMC / Isilon Storage Division


# 1c42633e 24-Jul-2014 Marius Strobl <marius@FreeBSD.org>

- Copying and zeroing pages via temporary mappings involves updating the
corresponding page tables followed by accesses to the pages in question.
This sequence is subject to the situation exactly described in the "AMD64
Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23,
"7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing
the INVLPG right after modifying the PTE bits is crucial.
For pmap_copy_page(), this has been broken in r124956 and later on carried
over to pmap_copy_pages() derived from the former, while all other places
in the i386 PMAP code use the correct order of instructions in this regard.
Fixing the latter breakage solves the problem of data corruption seen with
unmapped I/O enabled when running at least bare metal on AMD R-268D APUs.
However, this might also fix similar corruption reported for virtualized
environments.
- In pmap_copy_pages(), correctly set the cache bits on the source page being
copied. This change is thought to be a NOP for the real world, though. [2]

1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf

Submitted by: kib [2]
Reviewed by: alc, kib
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH


# 09132ba6 06-Jul-2014 Alan Cox <alc@FreeBSD.org>

Introduce pmap_unwire(). It will replace pmap_change_wiring(). There are
several reasons for this change:

pmap_change_wiring() has never (in my memory) been used to set the wired
attribute on a virtual page. We have always used pmap_enter() to do that.
Moreover, it is not really safe to use pmap_change_wiring() to set the wired
attribute on a virtual page. The description of pmap_change_wiring() says
that it assumes the existence of a mapping in the pmap. However, non-wired
mappings may be reclaimed by the pmap at any time. (See pmap_collect().)
Many implementations of pmap_change_wiring() will crash if the mapping does
not exist.

pmap_unwire() accepts a range of virtual addresses, whereas
pmap_change_wiring() acts upon a single virtual page. Since we are
typically unwiring a range of virtual addresses, pmap_unwire() will be more
efficient. Moreover, pmap_unwire() allows us to unwire superpage mappings.
Previously, we were forced to demote the superpage mapping, because
pmap_change_wiring() only allowed us to express the unwiring of a single
base page mapping at a time. This added to the overhead of unwiring for
large ranges of addresses, including the implicit unwiring that occurs at
process termination.

Implementations for arm and powerpc will follow.

Discussed with: jeff, marcel
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 3ae10f74 16-Jun-2014 Attilio Rao <attilio@FreeBSD.org>

- Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI. __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


# dd05fa19 07-Jun-2014 Alan Cox <alc@FreeBSD.org>

Add a page size field to struct vm_page. Increase the page size field when
a partially populated reservation becomes fully populated, and decrease this
field when a fully populated reservation becomes partially populated.

Use this field to simplify the implementation of pmap_enter_object() on
amd64, arm, and i386.

On all architectures where we support superpages, the cost of creating a
superpage mapping is roughly the same as creating a base page mapping. For
example, both kinds of mappings entail the creation of a single PTE and PV
entry. With this in mind, use the page size field to make the
implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little
smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to
vm_map_pmap_enter(), that function would only map base pages. Now, it will
create up to 96 base page or superpage mappings.

Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# f4385473 22-Feb-2014 Alan Cox <alc@FreeBSD.org>

When the kernel is running in a virtual machine, it cannot rely upon the
processor family to determine if the workaround for AMD Family 10h Erratum
383 should be enabled. To enable virtual machine migration among a
heterogeneous collection of physical machines, the hypervisor may have
been configured to report an older processor family with a reduced feature
set. Effectively, the reported processor family and its features are like
a "least common denominator" for the collection of machines.

Therefore, when the kernel is running in a virtual machine, instead of
relying upon the processor family, we now test for features that prove
that the underlying processor is not affected by the erratum. (The
features that we test for are unlikely to ever be emulated in software
on an affected physical processor.)

PR: 186061
Tested by: Simon Matter
Discussed with: jhb, neel
MFC after: 2 weeks


# 4f67a8c5 11-Feb-2014 John Baldwin <jhb@FreeBSD.org>

Don't waste a page of KVA for the boot-time memory test on x86. For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1. For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.

Reviewed by: alc, peter
MFC after: 2 weeks


# e07ef9b0 23-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Move <machine/apicvar.h> to <x86/apicvar.h>.


# deb179bb 19-Sep-2013 Alan Cox <alc@FreeBSD.org>

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


# 06646d66 16-Sep-2013 Konstantin Belousov <kib@FreeBSD.org>

Merge the change r255607 from amd64 to i386.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)


# 87ee6303 11-Sep-2013 Alan Cox <alc@FreeBSD.org>

Prior to r254304, we only began scanning the active page queue when the
amount of free memory was close to the point at which we would begin
reclaiming pages. Now, we continuously scan the active page queue,
regardless of the amount of free memory. Consequently, we are continuously
calling pmap_ts_referenced() on active pages.

Prior to this change, pmap_ts_referenced() would always demote superpage
mappings in order to obtain finer-grained reference information. This made
sense because we were coming under memory pressure and would soon have to
begin reclaiming pages. Now, however, with continuous scanning of the
active page queue, these demotions are taking a toll on performance. To
address this problem, I have replaced the demotion with a heuristic for
periodically clearing the reference flag on superpage mappings.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


# 51321f7c 29-Aug-2013 Alan Cox <alc@FreeBSD.org>

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


# e68c64f0 22-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


# c325e866 10-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue. Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked. See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members. Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


# e946b949 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl


# c7aebda8 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 5df87b21 07-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 2c0b86b4 04-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

- Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space. This works around several
asserts from pmap_demote_pde_locked that did not apply and gave false
warnings.

Discovered by: pho
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division


# 30dac21d 10-Jul-2013 Konstantin Belousov <kib@FreeBSD.org>

Explicitely panic instead of possibly doing undefined things when
ptelist KVA is exhausted. Currently this cannot happen, the added
panic serves as assert.

Discussed with: alc
Sponsored by: The FreeBSD Foundation


# 3fb25770 10-Jul-2013 Konstantin Belousov <kib@FreeBSD.org>

MFamd64 r253140:
Clear m->object for the page taken from the delayed free list in
pmap_pv_reclaim().

Noted by: alc


# 17a27377 13-Jun-2013 Jeff Roberson <jeff@FreeBSD.org>

- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()

Discussed with: attilio
Sponsored by: EMC / Isilon Storage Division


# 9af6d512 21-May-2013 Attilio Rao <attilio@FreeBSD.org>

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


# ee75e7de 19-Mar-2013 Konstantin Belousov <kib@FreeBSD.org>

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


# 774d251d 17-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
resident/cached pages collections as the per-vm_page_t splay iterators
are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations. In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone. This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves. However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan. It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by: EMC / Isilon storage division
In collaboration with: alc, jeff
Tested by: flo, pho, jhb, davide
Tested by: ian (arm)
Tested by: andreast (powerpc)


# e8a4a618 14-Mar-2013 Konstantin Belousov <kib@FreeBSD.org>

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


# 9f585991 10-Mar-2013 Alan Cox <alc@FreeBSD.org>

The kernel pmap is statically allocated, so there is really no need to
explicitly initialize its pm_root field to zero.

Sponsored by: EMC / Isilon Storage Division


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# b38d37f7 02-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Merge from vmc-playground branch:
Rename the pv_entry_t iterator from pv_list to pv_next.
Besides being more correct technically (as the name seems to suggest
this is a list while it is an iterator), it will also be needed by
vm_radix work to avoid a nameclash on macro expansions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: flo, pho, jhb, davide


# dc1558d1 27-Feb-2013 Attilio Rao <attilio@FreeBSD.org>

Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


# 00a54dfb 15-Feb-2013 Jung-uk Kim <jkim@FreeBSD.org>

Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is
no functional change.


# b5821c6f 18-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Fix build with SMP disabled.`

Reported by: bf


# f876ffea 17-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7). The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after: 2 weeks


# cfedf924 03-Nov-2012 Attilio Rao <attilio@FreeBSD.org>

Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by: alc, jimharris


# 9065aa64 24-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Add missed sched_pin().

Submitted by: Svatopluk Kraus <onwahe@gmail.com>
Reviewed by: alc
MFC after: 3 days


# 4e445832 08-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Add several asserts to i386 pmap, which mostly state that pv entry shall
have corresponding pte.

Reviewed by: alc
Tested by: pho
MFC after: 3 days


# e1de0706 08-Oct-2012 Alan Cox <alc@FreeBSD.org>

In a few places, like the implementation of ptrace(), a thread may call
upon pmap_enter() to create a mapping within a different address space,
i.e., not the thread's own address space. On i386, this entails the
creation of a temporary mapping to the affected page table page (PTP). In
general, pmap_enter() will read from this PTP, allocate a PV entry, and
write to this PTP. The trouble comes when the system is short of memory.
In order to allocate a new PV entry, an older PV entry has to be
reclaimed. Reclaiming a PV entry involves destroying a mapping, which
requires access to the affected PTP. Thus, the PTP mapped at the
beginning of pmap_enter() is no longer mapped at the end of pmap_enter(),
which leads to pmap_enter() modifying the wrong PTP. To address this
problem, pmap_pv_reclaim() is changed to use an alternate method of
mapping PTPs.

Update a related comment.

Reported by: pho
Diagnosed by: kib
MFC after: 5 days


# e4b8a2fc 27-Sep-2012 Alan Cox <alc@FreeBSD.org>

Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.


# 7336315b 10-Sep-2012 Alan Cox <alc@FreeBSD.org>

Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures. Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after: 10 days


# d8f9ed32 05-Sep-2012 Alan Cox <alc@FreeBSD.org>

Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them. Both the function names and the comment had grown
stale. Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page. Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.


# e93d0cbe 26-Jul-2012 Konstantin Belousov <kib@FreeBSD.org>

MFamd64 r238623:
Introduce curpcb magic variable.

Requested and reviewed by: bde
MFC after: 3 weeks


# e30df26e 26-Jun-2012 Alan Cox <alc@FreeBSD.org>

Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.


# 23c0d041 03-Jun-2012 Alan Cox <alc@FreeBSD.org>

Various small changes to PV entry management:

Constify pc_freemask[].

pmap_pv_reclaim()
Eliminate "freemask" because it was a pessimization. Add a comment about
the resident count adjustment.

free_pv_entry() [i386 only]
Merge an optimization from amd64 (r233954).

get_pv_entry()
Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
(The right strategy needs more thought. Moreover, there were unintended
differences between the amd64 and i386 implementation.)

pmap_remove_pages()
Eliminate unnecessary ()'s.


# 0d6f49d8 02-Jun-2012 Alan Cox <alc@FreeBSD.org>

Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.


# d85fbe8a 31-May-2012 Alan Cox <alc@FreeBSD.org>

Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().


# a2efa424 29-May-2012 Alan Cox <alc@FreeBSD.org>

Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.


# 4edfd622 28-May-2012 Alan Cox <alc@FreeBSD.org>

Update a comment in get_pv_entry() to reflect the changes to the
synchronization of pv_vafree in r236158.


# 8b0f4e0a 27-May-2012 Alan Cox <alc@FreeBSD.org>

Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the PV lists. However, it will be used in a somewhat unconventional
way. As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

X-MFC after: r236045


# 33853281 26-May-2012 Alan Cox <alc@FreeBSD.org>

Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.

MFC after: 5 weeks


# 4e656345 24-May-2012 Alan Cox <alc@FreeBSD.org>

MF amd64 r233097, r233122

With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.

Style fix to pmap_protect().


# 5d4c773b 24-Mar-2012 Alan Cox <alc@FreeBSD.org>

Disable detailed PV entry accounting by default. Add a config option
to enable it.

MFC after: 1 week


# fc6e32fb 19-Mar-2012 Konstantin Belousov <kib@FreeBSD.org>

If we ever allow for managed fictitious pages, the pages shall be
excluded from superpage promotions. At least one of the reason is
that pv_table is sized for non-fictitious pages only.

Consistently check for the page to be non-fictitious before accesing
superpage pv list.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 2 weeks


# fe8b9971 28-Dec-2011 Alan Cox <alc@FreeBSD.org>

Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():
If the page lock acquisition is retried, then the underlying thread is
not unpinned.

Wrap nearby lines that exceed 80 columns.


# 9800a50f 27-Dec-2011 Alan Cox <alc@FreeBSD.org>

Eliminate many of the unnecessary differences between the native and
paravirtualized pmap implementations for i386. This includes some
style fixes to the native pmap and several bug fixes that were not
previously applied to the paravirtualized pmap.

Tested by: sbruno
MFC after: 3 weeks


# 894b2848 14-Dec-2011 Alan Cox <alc@FreeBSD.org>

Create large page mappings in pmap_map().

MFC after: 6 weeks


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 703dec68 27-Oct-2011 Alan Cox <alc@FreeBSD.org>

Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc(). While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.


# 3407fefe 06-Sep-2011 Konstantin Belousov <kib@FreeBSD.org>

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# d98d0ce2 09-Aug-2011 Konstantin Belousov <kib@FreeBSD.org>

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


# 80788b2a 02-Jul-2011 Alan Cox <alc@FreeBSD.org>

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


# 6bbee8e2 29-Jun-2011 Alan Cox <alc@FreeBSD.org>

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


# d16f8274 28-Jun-2011 Attilio Rao <attilio@FreeBSD.org>

Remove pc_cpumask usage from i386 and XEN.

Tested by: pluknet


# 250a44f6 22-Jun-2011 Attilio Rao <attilio@FreeBSD.org>

Remove pc_other_cpus usage from i386 and XEN.

Tested by: pluknet


# d7eb69e1 24-May-2011 Attilio Rao <attilio@FreeBSD.org>

- Fix a misusage of cpuset_t objects
- Fix a typo

Reported by: pluknet


# d30e0db5 22-May-2011 Attilio Rao <attilio@FreeBSD.org>

Add a "safety belt" check for lsb setting.
I don't think it is really necessary because the cpumask is known to be
!= 0, but it is just in case.

Requested by: kib


# b2b45cca 20-May-2011 Attilio Rao <attilio@FreeBSD.org>

Reintroduce the lazypmap infrastructure and convert it to using
cpuset_t.

Requested by: alc


# 71a19bdc 05-May-2011 Attilio Rao <attilio@FreeBSD.org>

Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno


# 97340772 30-Apr-2011 Attilio Rao <attilio@FreeBSD.org>

Remove the support for lazy cr3 switching from i386.
amd64 has already this micro-optimization removed.

Submitted by: kib


# 3136faa5 18-Apr-2011 Konstantin Belousov <kib@FreeBSD.org>

Make pmap_invalidate_cache_range() available for consumption on amd64.

Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 4053b05b 21-Jan-2011 Sergey Kandaurov <pluknet@FreeBSD.org>

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


# 9d555e45 19-Dec-2010 Alan Cox <alc@FreeBSD.org>

Redo some parts of r216333, specifically, the locking changes to
pmap_extract_and_hold(), and undo the rest. In particular, I forgot
that PG_PS and PG_PTE_PAT are the same bit.


# a9b31c25 18-Dec-2010 Konstantin Belousov <kib@FreeBSD.org>

In pmap_extract(), unlock pmap lock earlier. The calculation does not need
the lock when operating on local variables.

Reviewed by: alc


# d1cf854b 09-Dec-2010 Alan Cox <alc@FreeBSD.org>

When r207410 eliminated the acquisition and release of the page queues
lock from pmap_extract_and_hold(), it didn't take into account that
pmap_pte_quick() sometimes requires the page queues lock to be held.
This change reimplements pmap_extract_and_hold() such that it no
longer uses pmap_pte_quick(), and thus never requires the page queues
lock.

For consistency, adopt the same idiom as used by the new
implementation of pmap_extract_and_hold() in pmap_extract() and
pmap_mincore(). It also happens to make these functions shorter.

Fix a style error in pmap_pte().

Reviewed by: kib@


# d2d0fda8 23-Nov-2010 Jung-uk Kim <jkim@FreeBSD.org>

Remove a stale tunable introduced in r215703.


# 7dd052c1 22-Nov-2010 Jung-uk Kim <jkim@FreeBSD.org>

- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual. Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-). Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as
WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the
default of WB and UC. This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency. Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks. With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more. Now this function is
identical on both amd64 and i386.

Reviewed by: jhb
Tested by: RM (reuf_m at hotmail dot com)
Ryszard Czekaj (rychoo at freeshell dot net)
army.of.root (army dot of dot root at googlemail dot com)
MFC after: 3 days


# 228a2537 07-Nov-2010 Alan Cox <alc@FreeBSD.org>

Eliminate a possible race between pmap_pinit() and pmap_kenter_pde() on
superpage promotion or demotion.

Micro-optimize pmap_kenter_pde().

Reviewed by: kib, jhb (an earlier version)
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# fb4c8540 05-Oct-2010 Alan Cox <alc@FreeBSD.org>

Initialize KPTmap in locore so that vm86.c can call vtophys() (or really
pmap_kextract()) before pmap_bootstrap() is called.

Document the set of pmap functions that may be called before
pmap_bootstrap() is called.

Tested by: bde@
Reviewed by: kib@
Discussed with: jhb@
MFC after: 6 weeks


# e0e08e6a 16-Aug-2010 Pietro Cerutti <gahr@FreeBSD.org>

- The iMac9,1 needs the PAT workaround as well

Approved by: cognet


# 4c967b61 10-Aug-2010 Attilio Rao <attilio@FreeBSD.org>

Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with: Yahoo! Incorporated (via sbruno and peter)
Tested by: gianni
MFC after: 1 month


# 8155e5d5 10-Jul-2010 Alan Cox <alc@FreeBSD.org>

Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after: 3 weeks


# 2680dac9 09-Jul-2010 Konstantin Belousov <kib@FreeBSD.org>

For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks


# 9124d0d6 11-Jun-2010 Alan Cox <alc@FreeBSD.org>

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


# ce186587 10-Jun-2010 Alan Cox <alc@FreeBSD.org>

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


# acb4c5ec 07-Jun-2010 Alan Cox <alc@FreeBSD.org>

MFC r208765
In the unlikely event that pmap_ts_referenced() demoted five superpage
mappings to the same underlying physical page, the calling thread would
be left forever pinned to the same processor.

Approved by: re (kib)


# 966898be 02-Jun-2010 Alan Cox <alc@FreeBSD.org>

In the unlikely event that pmap_ts_referenced() demoted five superpage
mappings to the same underlying physical page, the calling thread would be
left forever pinned to the same processor.

MFC after: 3 days


# b2830a96 31-May-2010 Alan Cox <alc@FreeBSD.org>

Eliminate a stale comment.


# 72dc3eb6 30-May-2010 Alan Cox <alc@FreeBSD.org>

Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.


# 8f0d5d3b 29-May-2010 Alan Cox <alc@FreeBSD.org>

When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().


# 52d8ba37 28-May-2010 Alan Cox <alc@FreeBSD.org>

Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released. This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().


# c46b90e9 26-May-2010 Alan Cox <alc@FreeBSD.org>

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


# 567e51e1 24-May-2010 Alan Cox <alc@FreeBSD.org>

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


# 9ab6032f 16-May-2010 Alan Cox <alc@FreeBSD.org>

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


# 3c4a2440 08-May-2010 Alan Cox <alc@FreeBSD.org>

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


# 7024db1d 06-May-2010 Alan Cox <alc@FreeBSD.org>

Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.


# 2965a453 29-Apr-2010 Kip Macy <kmacy@FreeBSD.org>

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 0d2e1c3e 25-Apr-2010 Alan Cox <alc@FreeBSD.org>

Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation. Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection. This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.


# c5cc832f 24-Apr-2010 Kip Macy <kmacy@FreeBSD.org>

- fix style issues on i386 as well

requested by: alc@


# 7b85f591 24-Apr-2010 Alan Cox <alc@FreeBSD.org>

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


# 02b5123e 05-Apr-2010 Alan Cox <alc@FreeBSD.org>

MFC r204907, r204913, r205402, r205573, r205573
Implement AMD's recommended workaround for Erratum 383 on Family 10h
processors.

Enable machine check exceptions by default.


# c7014073 03-Apr-2010 Alan Cox <alc@FreeBSD.org>

MFC r205652
A ptrace(2) by one process may trigger a page size promotion in the
address space of another process. Modify pmap_promote_pde() to handle
this.


# dfeca187 30-Mar-2010 Marcel Moolenaar <marcel@FreeBSD.org>

MFC rev 198341 and 198342:
o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).


# 3792de2e 27-Mar-2010 Alan Cox <alc@FreeBSD.org>

Correctly handle preemption of pmap_update_pde_invalidate().

X-MFC after: r205573


# a57d0d8e 27-Mar-2010 Alan Cox <alc@FreeBSD.org>

Simplify pmap_growkernel(), making the i386 version more like the amd64
version.

MFC after: 3 weeks


# 09fcdf11 25-Mar-2010 Alan Cox <alc@FreeBSD.org>

A ptrace(2) by one processor may trigger a promotion in the address space
of another process. Modify pmap_promote_pde() to handle this. (This is
not a problem on amd64 due to implementation differences.)

Reported by: jh@
MFC after: 1 week


# 2dec7615 24-Mar-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r204957:
Fall back to wbinvd when region for CLFLUSH is >= 2MB.

MFC r205334 (by avg):
Fix a typo in a comment.


# e1990590 23-Mar-2010 Alan Cox <alc@FreeBSD.org>

Adapt r204907 and r205402, the amd64 implementation of the workaround for
AMD Family 10h Erratum 383, to i386.

Enable machine check exceptions by default, just like r204913 for amd64.

Enable superpage promotion only if the processor actually supports large
pages, i.e., PG_PS.

MFC after: 2 weeks


# 9344361b 19-Mar-2010 Andriy Gapon <avg@FreeBSD.org>

pmap amd64/i386: fix a typo in a comment

MFC after: 3 days


# 55c4e016 11-Mar-2010 John Baldwin <jhb@FreeBSD.org>

Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.


# 2a595a40 10-Mar-2010 Konstantin Belousov <kib@FreeBSD.org>

Fall back to wbinvd when region for CLFLUSH is >= 2MB.

Submitted by: Kevin Day <toasty dragondata com>
Reviewed by: jhb
MFC after: 2 weeks


# c288186f 02-Mar-2010 Alan Cox <alc@FreeBSD.org>

MFC r204420
When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.


# 0b993ee5 27-Feb-2010 Alan Cox <alc@FreeBSD.org>

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


# 454947a6 20-Feb-2010 Alan Cox <alc@FreeBSD.org>

MFC r203085
Optimize pmap_demote_pde() by using the new KPTmap to access a kernel
page table page instead of creating a temporary mapping to it.

Set the PG_G bit on the page table entries that implement the KPTmap.

Locore initializes the unused portions of the NKPT kernel page table
pages that it allocates to zero. So, pmap_bootstrap() needn't zero
the page table entries referenced by CMAP1 and CMAP3.

Simplify pmap_set_pg().


# ddc534916 18-Feb-2010 Ed Schouten <ed@FreeBSD.org>

Allow the pmap code to be built with GCC from FreeBSD 7 again.

This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by: jhb


# bda39c37 01-Feb-2010 Alan Cox <alc@FreeBSD.org>

Change the default value for the flag enabling superpage mapping and
promotion to "on".


# 6d41eae0 29-Jan-2010 Alan Cox <alc@FreeBSD.org>

MFC r202894
Handle a race between pmap_kextract() and pmap_promote_pde().


# b3021b93 27-Jan-2010 Alan Cox <alc@FreeBSD.org>

Optimize pmap_demote_pde() by using the new KPTmap to access a kernel
page table page instead of creating a temporary mapping to it.

Set the PG_G bit on the page table entries that implement the KPTmap.

Locore initializes the unused portions of the NKPT kernel page table
pages that it allocates to zero. So, pmap_bootstrap() needn't zero
the page table entries referenced by CMAP1 and CMAP3.

Simplify pmap_set_pg().

MFC after: 10 days


# cf350851 23-Jan-2010 Alan Cox <alc@FreeBSD.org>

Handle a race between pmap_kextract() and pmap_promote_pde(). This race is
known to cause a kernel crash in ZFS on i386 when superpage promotion is
enabled.

Tested by: netchild
MFC after: 1 week


# 91bfd816 19-Jan-2010 Ed Schouten <ed@FreeBSD.org>

Recommit r193732:

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.


# 294a68ca 18-Jan-2010 Alan Cox <alc@FreeBSD.org>

MFC r202085
Simplify pmap_init(). Additionally, correct a harmless misbehavior on
i386.


# ac24a8ea 11-Jan-2010 Alan Cox <alc@FreeBSD.org>

Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized. The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after: 1 week


# 0f59b74f 09-Jan-2010 Alan Cox <alc@FreeBSD.org>

Long ago, in r120654, the rounding of KERNend and physfree in locore
was changed from a small page boundary to a large page boundary. As
a consequence pmap_kmem_choose() became a pointless waste of address
space. Eliminate it.


# 28a5e2a5 07-Jan-2010 Alan Cox <alc@FreeBSD.org>

Make pmap_set_pg() static.


# 3ed91d6a 08-Dec-2009 Andriy Gapon <avg@FreeBSD.org>

MFC r199184: reflect that pg_ps_enabled is a tunable


# 6cc16fcb 11-Nov-2009 Andriy Gapon <avg@FreeBSD.org>

reflect that pg_ps_enabled is a tunable, not just a read-only sysctl

Nod from: jhb


# dcf9f137 06-Nov-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r197070:
Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.

This MFC should unbreak the kernel build breakage introduced by
r198977.

Reported by: kib
Pointy hat to: me


# 13529df1 31-Oct-2009 Alan Cox <alc@FreeBSD.org>

MFC r197317
When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.


# 1a4fcaeb 21-Oct-2009 Marcel Moolenaar <marcel@FreeBSD.org>

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


# d6dbb0db 18-Sep-2009 Alan Cox <alc@FreeBSD.org>

When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


# 3bcdfb9b 10-Sep-2009 Jung-uk Kim <jkim@FreeBSD.org>

Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.


# bf202eb1 03-Sep-2009 John Baldwin <jhb@FreeBSD.org>

MFC 196705 and 196707:
- Improve pmap_change_attr() on i386 so that it is able to demote a large
(2/4MB) page into 4KB pages as needed. This should be fairly rare in
practice.
- Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Approved by: re (kib)


# c8e648e1 02-Sep-2009 Jung-uk Kim <jkim@FreeBSD.org>

Fix confusing comments about default PAT entries.


# c9e88179 02-Sep-2009 Jung-uk Kim <jkim@FreeBSD.org>

- Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb


# e2ba7437 01-Sep-2009 Robert Noland <rnoland@FreeBSD.org>

MFC 196643

Swap the start/end virtual addresses in pmap_invalidate_cache_range().

This fixes the functionality on non SelfSnoop hardware.

Found by: rnoland
Submitted by: alc
Reviewed by: kib
Approved by: re (rwatson)


# 8101afb6 31-Aug-2009 John Baldwin <jhb@FreeBSD.org>

Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with: alc
MFC after: 3 days


# 75e66e42 31-Aug-2009 John Baldwin <jhb@FreeBSD.org>

Improve pmap_change_attr() so that it is able to demote a large (2/4MB)
page into 4KB pages as needed. This should be fairly rare in practice
on i386. This includes merging the following changes from the amd64 pmap:
180430, 180485, 180845, 181043, 181077, and 196318.
- Add basic support for changing attributes on PDEs to pmap_change_attr()
similar to the support in the initial version of pmap_change_attr() on
amd64 including inlines for pmap_pde_attr() and pmap_pte_attr().
- Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.
- Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2/4MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2/4MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.
- Correct a critical accounting error in pmap_demote_pde().

Reviewed by: alc
MFC after: 3 days


# cbc3c1f6 29-Aug-2009 Robert Noland <rnoland@FreeBSD.org>

Swap the start/end virtual addresses in pmap_invalidate_cache_range().

This fixes the functionality on non SelfSnoop hardware.

Found by: rnoland
Submitted by: alc
Reviewed by: kib
MFC after: 3 days


# 8a5ac5d5 29-Jul-2009 Konstantin Belousov <kib@FreeBSD.org>

As was done in r195820 for amd64, use clflush for flushing cache lines
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.

Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.

Proposed and reviewed by: alc
Approved by: re (kensmith)


# 01381811 24-Jul-2009 John Baldwin <jhb@FreeBSD.org>

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 33131452 23-Jul-2009 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary cache and TLB flushes by pmap_change_attr(). (This
optimization was implemented in the amd64 version roughly 1 year ago.)

Approved by: re (kensmith)


# 9861cbc6 19-Jul-2009 Alan Cox <alc@FreeBSD.org>

Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by: re (kib)


# 13de7221 17-Jul-2009 Alan Cox <alc@FreeBSD.org>

An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by: re (kib)


# 3153e878 12-Jul-2009 Alan Cox <alc@FreeBSD.org>

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


# 0e18ab26 05-Jul-2009 Alan Cox <alc@FreeBSD.org>

PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM. This
requirement is met by defining an UMA zone with a custom back-end
allocator function. This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig(). This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags. (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.

The back-end allocator function in the Xen pmap is dead code. I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.

Approved by: re (kib)


# 387aabc5 14-Jun-2009 Alan Cox <alc@FreeBSD.org>

Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt(). This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous. Unfortunately, this is not always the case. For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption. Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous. This
revision also changes some inconsistent if not buggy behavior. For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid. However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page. The amd64 version has a bug of
my own creation. It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance. I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms. However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity). These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with: jhb@


# 5942207f 08-Jun-2009 Ed Schouten <ed@FreeBSD.org>

Revert my change; reintroduce __gnu89_inline.

It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by: rwatson


# 032e3d1d 08-Jun-2009 Ed Schouten <ed@FreeBSD.org>

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc


# 120b18d8 14-May-2009 Attilio Rao <attilio@FreeBSD.org>

FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 07a7b85e 13-May-2009 Alan Cox <alc@FreeBSD.org>

Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after: 6 weeks


# e34a906f 14-Mar-2009 Alan Cox <alc@FreeBSD.org>

MFamd64 r189785
Update the pmap's resident page count when a page table page is freed in
pmap_remove_pde() and pmap_remove_pages().

MFC after: 6 weeks


# a4079bfb 25-Feb-2009 Jung-uk Kim <jkim@FreeBSD.org>

Enable support for PAT_WRITE_PROTECTED and PAT_UNCACHED cache modes
unconditionally on amd64. On i386, we assume PAT is usable if the CPU
vendor is not Intel or CPU model is newer than Pentium IV.

Reviewed by: alc, jhb


# 6d65f2fa 22-Feb-2009 Alan Cox <alc@FreeBSD.org>

Optimize free_pv_entry(); specifically, avoid repeated TAILQ_REMOVE()s.

MFC after: 1 week


# 6be00eca 14-Feb-2009 Alan Cox <alc@FreeBSD.org>

Remove unnecessary page queues locking around vm_page_busy() and
vm_page_wakeup(). (This change is applicable to RELENG_7 but not
RELENG_6.)

MFC after: 1 week


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 1628900b 20-Sep-2008 Alan Cox <alc@FreeBSD.org>

MFamd64 SVN rev 179749 CVS rev 1.620
Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page. The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that
hasn't been modified (PG_M). In general, if there are two or more such
PTEs to choose among, it is better to write protect the one nearer the
high end of the page table page rather than the low end. This is because
most programs access memory in an ascending direction. The net result of
this change is a sometimes significant reduction in the number of failed
promotion attempts and the number of pages that are write protected by
pmap_promote_pde().

MFamd64 SVN rev 179777 CVS rev 1.621
Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M. This sometimes prevents unnecessary removal of write access
from a PTE. Overall, the net result is fewer demotions and promotion
failures.


# 2af89e55 18-Sep-2008 Alan Cox <alc@FreeBSD.org>

MFamd64 SVN rev 179471 CVS rev 1.619
Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space.


# 494c177e 04-Aug-2008 Alan Cox <alc@FreeBSD.org>

Make pmap_kenter_attr() static.


# e79980e1 27-Jul-2008 Alan Cox <alc@FreeBSD.org>

Correct an off-by-one error in the previous change to pmap_change_attr().
Change the nearby comment to mention the recursive map.


# cc1ec88f 27-Jul-2008 Alan Cox <alc@FreeBSD.org>

Don't allow pmap_change_attr() to be applied to the recursive mapping.


# 35db2ce0 27-Jul-2008 Alan Cox <alc@FreeBSD.org>

Style fixes to several function definitions.


# 59a23cac 18-Jul-2008 Alan Cox <alc@FreeBSD.org>

Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped. Previously, the loop was testing the
same address every time.

Submitted by: Magesh Dhasayyan


# 53d13c60 18-Jul-2008 Alan Cox <alc@FreeBSD.org>

Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().


# cc82a18b 07-Jul-2008 Alan Cox <alc@FreeBSD.org>

In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page. Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after: 1 week


# 1ec1304b 17-May-2008 Alan Cox <alc@FreeBSD.org>

Retire pmap_addr_hint(). It is no longer used.


# ef4d480c 11-May-2008 Alan Cox <alc@FreeBSD.org>

Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.


# d3249b14 09-May-2008 Alan Cox <alc@FreeBSD.org>

Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.


# 26b77ff3 25-Apr-2008 Alan Cox <alc@FreeBSD.org>

Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.


# f4d2c7f1 10-Apr-2008 Alan Cox <alc@FreeBSD.org>

Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().


# 109d4932 07-Apr-2008 Alan Cox <alc@FreeBSD.org>

Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.


# 7630c265 04-Apr-2008 Alan Cox <alc@FreeBSD.org>

Reintroduce UMA_SLAB_KMAP; however, change its spelling to
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.

Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks


# 4ae6e474 28-Mar-2008 Alan Cox <alc@FreeBSD.org>

Eliminate an #if 0/#endif that was unintentionally introduced
by the previous revision.


# 96a6e6e6 28-Mar-2008 Brooks Davis <brooks@FreeBSD.org>

Use ; instead of : to end a line.

Submitted by: Niclas Zeising <niclas dot zeising at gmail dot com>


# 6e7534b8 27-Mar-2008 Paul Saab <ps@FreeBSD.org>

Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by: alc, ups


# 97dbe5e4 26-Mar-2008 Alan Cox <alc@FreeBSD.org>

MFamd64 with few changes:

1. Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. Tested by: kris

2. To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.


# 3f7905d2 23-Mar-2008 Konstantin Belousov <kib@FreeBSD.org>

Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week


# 6634dbbd 17-Jan-2008 Alan Cox <alc@FreeBSD.org>

Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)

Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.

Eliminate \n from a nearby panic string.

Use KASSERT() to reimplement pmap_copy()'s two assertions.


# a658a1e0 14-Jan-2008 Peter Wemm <peter@FreeBSD.org>

Add a CTASSERT that KERNBASE is valid. This is usually messed up by an
invalid KVA_PAGES, so add a pointer to there.


# fa093ee2 08-Jan-2008 Alan Cox <alc@FreeBSD.org>

Convert a PMAP_DIAGNOSTIC to a KASSERT.


# 5cccf586 06-Jan-2008 Alan Cox <alc@FreeBSD.org>

Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.


# eb2a0517 03-Jan-2008 Alan Cox <alc@FreeBSD.org>

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


# 86f14493 02-Jan-2008 Alan Cox <alc@FreeBSD.org>

Provide a legitimate pindex to vm_page_alloc() in pmap_growkernel()
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.

Correct a nearby style error: Pointers should be compared to NULL.


# dbfb54ff 09-Dec-2007 Alan Cox <alc@FreeBSD.org>

Eliminate compilation warnings due to the use of non-static inlines
through the introduction and use of the __gnu89_inline attribute.

Submitted by: bde (i386)
MFC after: 3 days


# d1ce3dfa 04-Dec-2007 Alan Cox <alc@FreeBSD.org>

Correct an error under COUNT_IPIS within pmap_lazyfix_action(): Increment
the counter that the pointer refers to, not the pointer.

MFC after: 3 days


# 58041e4b 30-Nov-2007 Alan Cox <alc@FreeBSD.org>

Improve get_pv_entry()'s handling of low-memory conditions. After page
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.

Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.

MFC after: 6 weeks


# 59677d3c 17-Nov-2007 Alan Cox <alc@FreeBSD.org>

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


# 6dd3a6c0 13-Nov-2007 Peter Wemm <peter@FreeBSD.org>

Drastically simplify the i386 pcpu backend by merging parts of the
amd64 mechanism over. Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do. Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up. As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.

Background information: Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address. In order to share page tables, we switched
to using the GDT and %fs/%gs to access it. But we still did the evil
magic to set it up for the old way. The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored. (excercise for reader: free these afterwards).


# 605385f8 05-Nov-2007 Alan Cox <alc@FreeBSD.org>

Add comments explaining why all stores updating a non-kernel page table
must be globally performed before calling any of the TLB invalidation
functions.

With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)

Reported and reviewed by: ups
MFC after: 3 days


# 89b57fcf 05-Nov-2007 Konstantin Belousov <kib@FreeBSD.org>

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 6afd4b92 02-Nov-2007 Alan Cox <alc@FreeBSD.org>

Eliminate spurious "Approaching the limit on PV entries, ..."
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)

Reported by: Jeremy Chadwick

Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.

MFC after: 5 days


# 8beae253 20-Aug-2007 Alan Cox <alc@FreeBSD.org>

In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by: re (kensmith)


# ba4b85e4 01-Jul-2007 Alan Cox <alc@FreeBSD.org>

Pages that do belong to an object and page queue can now be freed without
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.

Approved by: re (kensmith)


# 2feb50bf 31-May-2007 Attilio Rao <attilio@FreeBSD.org>

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 80b200da 20-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.

Suggested by: julian@
Contributed by: attilio@


# 222d0195 18-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 31b4f4a9 21-Apr-2007 Stephan Uphoff <ups@FreeBSD.org>

Modify TLB invalidation handling.

Reviewed by: alc@, peter@
MFC after: 1 week


# 0b765048 13-Apr-2007 Alan Cox <alc@FreeBSD.org>

Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@


# 2e137367 06-Apr-2007 Ruslan Ermilov <ru@FreeBSD.org>

Add the PG_NX support for i386/PAE.

Reviewed by: alc


# 8a5e898d 24-Mar-2007 Alan Cox <alc@FreeBSD.org>

In order to satisfy ACPI's need for an identity mapping, modify the
temporary mapping created by locore so that the lowest two to four
megabytes can become a permanent identity mapping. This implementation
avoids any use of a large page mapping.


# 8cfba726 17-Mar-2007 Alan Cox <alc@FreeBSD.org>

Eliminate an unused parameter.


# 9b0df55b 14-Mar-2007 Nate Lawson <njl@FreeBSD.org>

Create an identity mapping (V=P) super page for the low memory region on
boot. Then, just switch to the kernel pmap when suspending instead of
allocating/freeing our own mapping every time. This should solve a panic
of pmap_remove() being called with interrupts disabled. Thanks to Alan
Cox for developing this patch.

Note: this means that ACPI requires super page (PG_PS) support in the CPU.
This has been present since the Pentium and first documented in the
Pentium Pro. However, it may need to be revisited later.

Submitted by: alc
MFC after: 1 month


# 8da3fc95 05-Mar-2007 Alan Cox <alc@FreeBSD.org>

Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks


# ae0663a3 17-Feb-2007 Alan Cox <alc@FreeBSD.org>

Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.


# f67af5c9 17-Jan-2007 Xin LI <delphij@FreeBSD.org>

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


# da449604 19-Nov-2006 Alan Cox <alc@FreeBSD.org>

The global variable avail_end is redundant and only used once. Eliminate
it. Make avail_start static to the pmap on amd64. (It no longer exists
on other architectures.)


# 79ba24ca 16-Nov-2006 Maxim Konovalov <maxim@FreeBSD.org>

o Make pv_maxchunks no less than maxproc. This helps to survive a
forkbomb explosion.

Reviewed by: alc
Security: local DoS
X-MFC atfer: RELENG_6 is not affected due to a different pv_entry
allocation code.


# 44b8bd66 12-Nov-2006 Alan Cox <alc@FreeBSD.org>

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


# 43200cd3ed 21-Oct-2006 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary PG_BUSY tests.


# d25fdf53 14-Aug-2006 John Baldwin <jhb@FreeBSD.org>

Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA). We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by: alc


# 7e9f73f3 11-Aug-2006 John Baldwin <jhb@FreeBSD.org>

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


# 14aaab53 06-Aug-2006 Alan Cox <alc@FreeBSD.org>

Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().


# 78985e42 01-Aug-2006 Alan Cox <alc@FreeBSD.org>

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


# 3cad40e5 20-Jul-2006 Alan Cox <alc@FreeBSD.org>

Add pmap_clear_write() to the interface between the virtual memory
system's machine-dependent and machine-independent layers. Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write(). Why? Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures. The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified." However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission. Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission. On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64. This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.

Discussed with: grehan@ and marcel@


# 7c9cc27f 17-Jul-2006 Alan Cox <alc@FreeBSD.org>

MFamd64
pmap_clear_ptes() is already convoluted. This will worsen with the
implementation of superpages. Eliminate it and add pmap_clear_write().

There are no functional changes. Checked by: md5


# e4cec283 16-Jul-2006 Alan Cox <alc@FreeBSD.org>

Now that free_pv_entry() accesses the pmap, call free_pv_entry() in
pmap_remove_all() before rather than after the pmap is unlocked. At
present, the page queues lock provides sufficient sychronization. In the
future, the page queues lock may not always be held when free_pv_entry() is
called.


# d259662b 16-Jul-2006 Alan Cox <alc@FreeBSD.org>

MFamd64
Make three simplifications to pmap_ts_referenced():
Eliminate an initialized but otherwise unused variable.
Eliminate an unnecessary test.
Exit the loop in a shorter way.


# ddd6244a 16-Jul-2006 Alan Cox <alc@FreeBSD.org>

Eliminate the remaining uses of "register".

Convert the remaining K&R-style function declarations to ANSI-style.

Eliminate excessive white space from pmap_ts_referenced().


# d3dd65ab 15-Jul-2006 Alan Cox <alc@FreeBSD.org>

Make pc_freemask an array of uint32_t, rather than uint64_t. (I believe
that the use of the latter is simply an oversight in porting the new pv
entry code from amd64.)


# da536e63 02-Jul-2006 Alan Cox <alc@FreeBSD.org>

Correct an error in the new pmap_collect(), thus only affecting HEAD.
Specifically, the pv entry was always being freed to the caller's pmap
instead of the pmap to which the pv entry belongs.


# 8e0e1e22 26-Jun-2006 Alan Cox <alc@FreeBSD.org>

Correct a very old and very obscure bug: vmspace_fork() calls
pmap_copy() if the mapping is VM_INHERIT_SHARE. Suppose the mapping
is also wired. vmspace_fork() clears the wiring attributes in the vm
map entry but pmap_copy() copies the PG_W attribute in the PTE. I
don't think this is catastrophic. It blocks pmap_remove_pages() from
destroying the mapping and corrupts the pmap's wiring count.

This revision fixes the problem by changing pmap_copy() to clear the
PG_W attribute.

Reviewed by: tegge@


# 031bf5ea 25-Jun-2006 Alan Cox <alc@FreeBSD.org>

Eliminate a comment that became stale after revision 1.547.


# f0544664 20-Jun-2006 Alan Cox <alc@FreeBSD.org>

Change get_pv_entry() such that the call to vm_page_alloc() specifies
VM_ALLOC_NORMAL instead of VM_ALLOC_SYSTEM when try is TRUE. In other
words, when get_pv_entry() is permitted to fail, it no longer tries as
hard to allocate a page.

Change pmap_enter_quick_locked() to fail rather than wait if it is
unable to allocate a page table page. This prevents a race between
pmap_enter_object() and the page daemon. Specifically, an inactive
page that is a successor to the page that was given to
pmap_enter_quick_locked() might become a cache page while
pmap_enter_quick_locked() waits and later pmap_enter_object() maps
the cache page violating the invariant that cache pages are never
mapped. Similarly, change
pmap_enter_quick_locked() to call pmap_try_insert_pv_entry() rather
than pmap_insert_entry(). Generally speaking,
pmap_enter_quick_locked() is used to create speculative mappings. So,
it should not try hard to allocate memory if free memory is scarce.

Add an assertion that the object containing m_start is locked in
pmap_enter_object(). Remove a similar assertion from
pmap_enter_quick_locked() because that function no longer accesses the
containing object.

Remove a stale comment.

Reviewed by: ups@


# 2053c127 14-Jun-2006 Stephan Uphoff <ups@FreeBSD.org>

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


# b74a62d6 12-Jun-2006 Alan Cox <alc@FreeBSD.org>

Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@


# ce142d9e 05-Jun-2006 Alan Cox <alc@FreeBSD.org>

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


# 62b5e735 05-Jun-2006 Alan Cox <alc@FreeBSD.org>

MFamd64
Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().

Reduce the scope of the page queues lock in pmap_remove_pages().


# 2b8a339c 01-May-2006 John Baldwin <jhb@FreeBSD.org>

Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after: 1 month


# 4ac60df5 01-May-2006 John Baldwin <jhb@FreeBSD.org>

Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after: 1 month


# ada5d7d5 01-May-2006 Peter Wemm <peter@FreeBSD.org>

Using an idea from Stephan Uphoff, use the empty pte's that correspond
to the unused kva in the pv memory block to thread a freelist through.
This allows us to free pages that used to be used for pv entry chunks
since we can now track holes in the kva memory block.

Idea from: ups


# 4c8eff70 01-May-2006 Peter Wemm <peter@FreeBSD.org>

Fix missing changes required for the amd64->i386 conversion. Add the
missing VM_ALLOC_WIRED flags to vm_page_alloc() calls I added.

Submitted by: alc


# 7eeda227 28-Apr-2006 Peter Wemm <peter@FreeBSD.org>

Interim fix for pmap problems I introduced with my last commit.
Remove the code to dyanmically change the pv_entry limits. Go back
to a single fixed kva reservation for pv entries, like was done
before when using the uma zone. Go back to never freeing pages
back to the free pool after they are no longer used, just like
before.

This stops the lock order reversal due to aquiring the kernel map
lock while pmap was locked.

This fixes the recursive panic if invariants are enabled.

The problem was that allocating/freeing kva causes vm_map_entry
nodes to be allocated/freed. That can recurse back into pmap as
new pages are hooked up to kvm and hence all the problem.
Allocating/freeing kva indirectly allocate/frees memory.

So, by going back to a single fixed size kva block and an index,
we avoid the recursion panics and the LOR.

The problem is that now with a linear block of kva, we have no
mechanism to track holes once pages are freed. UMA has the same
problem when using custom object for a zone and a fixed reservation
of kva. Simple solutions like having a bitmap would work, but would
be very inefficient when there are hundreds of thousands of bits
in the map. A first-free pointer is similarly flawed because pages
can be freed at random and the first-free pointer would be rewinding
huge amounts. If we could allocate memory for tree strucures or
an external freelist, that would work. Except we cannot allocate/free
memory here because we cannot allocate/free address space to use
it in. Anyway, my change here reverts back to the UMA behavior of
not freeing pages for now, thereby avoiding holes in the map.

ups@ had a truely evil idea that I'll investigate. It should allow
freeing unused pages again by giving us a no-cost way to track the
holes in the kva block. But in the meantime, this should get people
booting with witness and/or invariants again.

Footnote: amd64 doesn't have this problem because of the direct map
access method. I'd done all my witness/invariants testing there. I'd
never considered that the harmless-looking kmem_alloc/kmem_free calls
would cause such a problem and it didn't show up on the boot test.


# 7dece6c7 27-Apr-2006 Alan Cox <alc@FreeBSD.org>

In general, bits in the page directory entry (PDE) and the page table
entry (PTE) have the same meaning. The exception to this rule is the
eighth bit (0x080). It is the PS bit in a PDE and the PAT bit in a
PTE. This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().

Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().

Reviewed by: jhb


# 027ed650 26-Apr-2006 Xin LI <delphij@FreeBSD.org>

Fix build on i386


# 041a991f 26-Apr-2006 Peter Wemm <peter@FreeBSD.org>

MFamd64: shrink pv entries from 24 bytes to about 12 bytes. (336 pv entries
per page = effectively 12.19 bytes per pv entry after overheads).
Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq
nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a
time per process. This provides 336 pv entries per process (actually, per
pmap address space) and eliminates one of the 8-byte tailq entries since
we now can track per-process pv entries implicitly. The pointer to
the pmap can be eliminated by doing address arithmetic to find the metadata
on the page headers to find a single pointer shared by all 336 entries.
There is an 11-int bitmap for the freelist of those 336 entries.

This is mostly a mechanical conversion from amd64, except:
* i386 has to allocate kvm and map the pages, amd64 has them outside of kvm
* native word size is smaller, so bitmaps etc become 32 bit instead of 64
* no dump_add_page() etc stuff because they are in kvm always.
* various pmap internals tweaks because pmap uses direct map on amd64 but
on i386 it has to use sched_pin and temporary mappings.

Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now
dynamic sysctls. Like on amd64, i386 can now tune the pv entry limits
without a recompile or reboot.

This is important because of the following scenario. If you have a 1GB
file (262144 pages) mmap()ed into 50 processes, that requires 13 million
pv entries. At 24 bytes per pv entry, that is 314MB of ram and kvm, while
at 12 bytes it is 157MB. A 157MB saving is significant.

Test-run by: scottl (Thanks!)


# 826c2072 11-Apr-2006 Alan Cox <alc@FreeBSD.org>

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


# b9eee07e 03-Apr-2006 Peter Wemm <peter@FreeBSD.org>

Remove the unused sva and eva arguments from pmap_remove_pages().


# 9c6a71e4 01-Apr-2006 Alan Cox <alc@FreeBSD.org>

Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry(). This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy(). The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated. Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days


# fa8053e9 21-Mar-2006 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary invalidations of the entire TLB by pmap_remove().
Specifically, on mappings with PG_G set pmap_remove() not only performs
the necessary per-page invlpg invalidations but also performs an
unnecessary invalidation of the entire set of non-PG_G entries.

Reviewed by: tegge


# 39d3e619 20-Mar-2006 David Xu <davidxu@FreeBSD.org>

Remove stale KSE code.

Reviewed by: alc


# 6bd7e81d 16-Feb-2006 Tor Egge <tegge@FreeBSD.org>

Rounding addr upwards to next 4M or 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reviewed by: alc


# f0b98139 05-Dec-2005 John Baldwin <jhb@FreeBSD.org>

- Move the code to deal with handling an IPI_STOP IPI out of
ipi_nmi_handler() and into a new cpustop_handler() function. Change
the Xcpustop IPI_STOP handler to call this function instead of
duplicating all the same logic in assembly.
- EOI the local APIC for the lapic timer interrupt in C rather than
assembly.
- Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than
assembly.


# 97a0c226 19-Nov-2005 Alan Cox <alc@FreeBSD.org>

Eliminate pmap_init2(). It's no longer used.


# 421552a5 13-Nov-2005 Warner Losh <imp@FreeBSD.org>

Provide a dummy NO_XBOX option that lives in opt_xbox.h for pc98.
This allows us to eliminate a three ifdef PC98 instances.


# 65336314 12-Nov-2005 Alan Cox <alc@FreeBSD.org>

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


# 1ba0023e 08-Nov-2005 Yoshihiro Takahashi <nyan@FreeBSD.org>

Fix pc98 build.


# 7a35a21e 09-Nov-2005 Alan Cox <alc@FreeBSD.org>

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


# 51ef421d 08-Nov-2005 Warner Losh <imp@FreeBSD.org>

Add support for XBOX to the FreeBSD port. The xbox architecture is
nearly identical to wintel/ia32, with a couple of tweaks. Since it is
so similar to ia32, it is optionally added to a i386 kernel. This
port is preliminary, but seems to work well. Further improvements
will improve the interaction with syscons(4), port Linux nforce driver
and future versions of the xbox.

This supports the 64MB and 128MB boxes. You'll need the most recent
CVS version of Cromwell (the Linux BIOS for the XBOX) to boot.

Rink will be maintaining this port, and is interested in feedback.
He's setup a website http://xbox-bsd.nl to report the latest
developments.

Any silly mistakes are my fault.

Submitted by: Rink P.W. Springer rink at stack dot nl and
Ed Schouten ed at fxq dot nl


# e9cb1037 04-Nov-2005 Alan Cox <alc@FreeBSD.org>

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


# f7118bdf 31-Oct-2005 Alan Cox <alc@FreeBSD.org>

Instead of a panic()ing in pmap_insert_entry() if get_pv_entry()
fails, reclaim a pv entry by destroying a mapping to an inactive
page.

Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.


# 6fb8d0e3 30-Oct-2005 Alan Cox <alc@FreeBSD.org>

Replace diagnostic printf()s by assertions. Use consistent style for
similar assertions.


# 8d228514 21-Oct-2005 Ade Lovett <ade@FreeBSD.org>

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


# 3be99ffc 04-Sep-2005 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary TLB invalidations by pmap_enter(). Specifically,
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.

Reviewed by: tegge


# ba8bca61 03-Sep-2005 Alan Cox <alc@FreeBSD.org>

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


# f564b2d2 27-Aug-2005 Alan Cox <alc@FreeBSD.org>

MFamd64 revision 1.526
When pmap_allocpte() destroys a 2/4MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should.


# 96e51094 14-Aug-2005 Alan Cox <alc@FreeBSD.org>

Simplify the page table page reference counting by pmap_enter()'s change of
mapping case.

Eliminate a stale comment from pmap_enter().

Reviewed by: tegge


# 50b33450 11-Aug-2005 Alan Cox <alc@FreeBSD.org>

Eliminate unneeded diagnostic code.

Eliminate an unused #include. (Kernel stack allocation and deallocation
long ago migrated to the machine-independent code.)


# b69dd0fd 11-Aug-2005 Alan Cox <alc@FreeBSD.org>

Eliminate unneeded diagnostic code.

Reviewed by: tegge


# 8e7a85fa 10-Aug-2005 Alan Cox <alc@FreeBSD.org>

Decouple the unrefing of a page table page from the removal of a pv entry.
In other words, change pmap_remove_entry() such that it no longer unrefs
the page table page. Now, it only removes the pv entry.

Reviewed by: tegge


# 5f2c46d5 07-Aug-2005 Alan Cox <alc@FreeBSD.org>

When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.

The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.


# bca79029 29-Jul-2005 John Baldwin <jhb@FreeBSD.org>

Fix a bug in pmap_protect() in the PAE case where it would try to look up
the vm_page_t associated with a pte using only the lower 32-bits of the pte
instead of the full 64-bits.

Submitted by: Greg Taleck greg at isilon dot com
Reviewed by: jeffr, alc
MFC after: 3 days


# 60baed37 02-Jul-2005 Xin LI <delphij@FreeBSD.org>

Remove the CPU_ENABLE_SSE option from the i386 and pc98 architectures,
as they are already default for I686_CPU for almost 3 years, and
CPU_DISABLE_SSE always disables it. On the other hand, CPU_ENABLE_SSE
does not work for I486_CPU and I586_CPU.

This commit has:
- Removed the option from conf/options.*
- Removed the option and comments from MD NOTES files
- Simplified the CPU_ENABLE_SSE ifdef's so they don't
deal with CPU_ENABLE_SSE from kernel configuration. (*)

For most users, this commit should be largely no-op. If you used to
place CPU_ENABLE_SSE into your kernel configuration for some reason,
it is time to remove it.

(*) The ifdef's of CPU_ENABLE_SSE are not removed at this point, since
we need to change it to !defined(CPU_DISABLE_SSE) && defined(I686_CPU),
not just !defined(CPU_DISABLE_SSE), if we really want to do so.

Discussed on: -arch
Approved by: re (scottl)


# 1c245ae7 09-Jun-2005 Alan Cox <alc@FreeBSD.org>

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


# fb1b26da 05-Feb-2005 Alan Cox <alc@FreeBSD.org>

Implement proper handling of PG_G mappings in pmap_protect(). (I don't
believe that this omission mattered before the introduction of MemGuard.)

Reviewed by: tegge@
MFC after: 1 week


# 1f70d622 23-Dec-2004 Alan Cox <alc@FreeBSD.org>

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


# 85f5b245 15-Dec-2004 Alan Cox <alc@FreeBSD.org>

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


# 8b902508 06-Dec-2004 Stephan Uphoff <ups@FreeBSD.org>

Move reading the current CPU mask in pmap_lazyfix() to where the thread
is protected from migrating to another CPU.

Approved by: sam (mentor)
MFC after: 4 weeks


# 4878c3cd 01-Dec-2004 Alan Cox <alc@FreeBSD.org>

For efficiency move the call to pmap_pte_quick() out of pmap_protect()'s
and pmap_remove()'s inner loop.

Reviewed by: peter@, tegge@


# 6004362e 26-Nov-2004 David Schultz <das@FreeBSD.org>

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 2d68e3fb 16-Nov-2004 John Baldwin <jhb@FreeBSD.org>

Initiate deorbit burn sequence for 80386 support in FreeBSD: Remove
80386 (I386_CPU) support from the kernel.


# 7c76d642 29-Oct-2004 Alan Cox <alc@FreeBSD.org>

Implement per-CPU SYSMAPs, i.e., CADDR* and CMAP*, to reduce lock
contention within pmap_zero_page() and pmap_copy_page().


# aced26ce 08-Oct-2004 Alan Cox <alc@FreeBSD.org>

Make pte_load_store() an atomic operation in all cases, not just i386 PAE.

Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852


# caa665aa 03-Oct-2004 Alan Cox <alc@FreeBSD.org>

Undo revision 1.251. This change was a performance pessimizing work-around
that is no longer required. (In fact, it is not clear that it was ever
required in HEAD or RELENG_4, only RELENG_3 required a work-around.) Now,
as before revision 1.251, if the preexisting PTE is invalid, pmap_enter()
does not call pmap_invalidate_page() to update the TLB(s).

Note: Even with this change, the handling of a copy-on-write fault is
inefficient, in such cases pmap_enter() calls pmap_invalidate_page() twice.

Discussed with: bde@
PR: kern/16568


# 8ceb3dcb 02-Oct-2004 Alan Cox <alc@FreeBSD.org>

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


# 07b33039 02-Oct-2004 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


# 0a752e98 29-Sep-2004 Alan Cox <alc@FreeBSD.org>

Prevent the unexpected deallocation of a page table page while performing
pmap_copy(). This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable. (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep. This change
makes this scenario impossible.)

Reviewed by: tegge@


# a9711396 21-Sep-2004 Alan Cox <alc@FreeBSD.org>

Correct a long-standing error in _pmap_unwire_pte_hold() affecting
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.

See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.

Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.

MT5 Candidate


# de6c3db0 19-Sep-2004 Alan Cox <alc@FreeBSD.org>

Simplify the reference counting of page table pages. Specifically, use
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:

1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.

2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.

3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.

Reviewed by: tegge@


# 8478ea24 18-Sep-2004 Alan Cox <alc@FreeBSD.org>

Remove an outdated assertion from _pmap_allocpte(). (When vm_page_alloc()
succeeds, the page's queue field is unconditionally set to PQ_NONE by
vm_pageq_remove_nowakeup().)


# 7580b56b 18-Sep-2004 Alan Cox <alc@FreeBSD.org>

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


# 031102cc 12-Sep-2004 Alan Cox <alc@FreeBSD.org>

Use an atomic op to update the pte in pmap_protect(). This is to prevent
the loss of a page modified (PG_M) bit in a race between processors.

Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.

Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.

In collaboration with: tegge@

MT5 candidate
PR: 61852


# 1e7fad6b 11-Sep-2004 Scott Long <scottl@FreeBSD.org>

Revert the previous round of changes to td_pinned. The scheduler isn't
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic. Use ke_pinned instead as was originally
done with Tor's patch.

Approved by: julian


# 5c854acc 10-Sep-2004 Julian Elischer <julian@FreeBSD.org>

Make up my mind if cpu pinning is stored in the thread structure or the
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.

Submitted by: tegge (of course!) (with changes)
MFC after: 3 days


# e232eb82 08-Sep-2004 Alan Cox <alc@FreeBSD.org>

Use atomic ops in pmap_clear_ptes() to prevent SMP races that could
result in the loss of an accessed or modified bit from the pte.

In collaboration with: tegge@

MT5 candidate


# 3c3e8d11 01-Sep-2004 Alan Cox <alc@FreeBSD.org>

Correction to the previous revision: I forgot to apply the ones complement
to a constant. This didn't show in testing because the broken expression
produced the same result in my tests as the correct expression.


# e33353b5 01-Sep-2004 Alan Cox <alc@FreeBSD.org>

Modify pmap_pte() to support its use on non-current, non-kernel pmaps
without holding Giant.


# bfa15df9 29-Aug-2004 Alan Cox <alc@FreeBSD.org>

Remove unnecessary check for curthread == NULL.


# dd68efd0 27-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

s/smp_rv_mtx/smp_ipi_mtx/g

Requested by: jhb


# 8991a235 27-Aug-2004 Alan Cox <alc@FreeBSD.org>

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


# f1009e1e 23-Aug-2004 Peter Wemm <peter@FreeBSD.org>

Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by: jhb


# a9cb79ba 07-Aug-2004 Alan Cox <alc@FreeBSD.org>

With the advent of pmap locking it makes sense for pmap_copy() to be less
forgiving about inconsistencies in the source pmap. Also, remove a new-
line character terminating a nearby panic string.


# 0e5a07e5 04-Aug-2004 John Baldwin <jhb@FreeBSD.org>

Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other. With this change, only
one of the CPUs would do an IPI and spin-wait at a time.


# 1b3b9cfe 04-Aug-2004 Alan Cox <alc@FreeBSD.org>

Post-locking clean up/simplification, particularly, the elimination of
vm_page_sleep_if_busy() and the page table page's busy flag as a
synchronization mechanism on page table pages.

Also, relocate the inline pmap_unwire_pte_hold() so that it can be used
to shorten _pmap_unwire_pte_hold() on alpha and amd64. This places
pmap_unwire_pte_hold() next to a comment that more accurately describes
it than _pmap_unwire_pte_hold().


# c6bf9f04 31-Jul-2004 Alan Cox <alc@FreeBSD.org>

Add pmap locking to pmap_object_init_pt().


# a0879143 29-Jul-2004 Alan Cox <alc@FreeBSD.org>

Advance the state of pmap locking on alpha, amd64, and i386.

- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)


# e1021dde 21-Jul-2004 Olivier Houchard <cognet@FreeBSD.org>

Using NULL as a malloc type when calling contigmalloc() is wrong, so introduce
a new malloc type, and use it.


# aec86de4 18-Jul-2004 Alan Cox <alc@FreeBSD.org>

Utilize pmap_pte_quick() rather than pmap_pte() in pmap_protect(). The
reason being that pmap_pte_quick() requires the page queues lock, which is
already held, rather than Giant.


# b73cfbb3 17-Jul-2004 Alan Cox <alc@FreeBSD.org>

Remedy my omission of one change in the prevision revision: pmap_remove()
must pin the current thread in order to call pmap_pte_quick().


# c9829537 17-Jul-2004 Alan Cox <alc@FreeBSD.org>

- Utilize pmap_pte_quick() rather than pmap_pte() in pmap_remove() and
pmap_remove_page(). The reason being that pmap_pte_quick() requires
the page queues lock, which is already held, rather than Giant.
- Assert that the page queues lock is held in pmap_remove_page() and
pmap_remove_pte().


# 3d2e54c3 15-Jul-2004 Alan Cox <alc@FreeBSD.org>

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


# ce8da309 12-Jul-2004 Alan Cox <alc@FreeBSD.org>

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


# 26a96556 07-Jul-2004 Alan Cox <alc@FreeBSD.org>

Simplify the control flow in pmap_extract(), enabling the elimination of a
PMAP_UNLOCK() call.


# 03c0ca74 06-Jul-2004 Alan Cox <alc@FreeBSD.org>

White space and style changes only.


# c9a217d2 05-Jul-2004 Alan Cox <alc@FreeBSD.org>

Style changes to pmap_extract().


# 1b74731b 29-Jun-2004 Peter Wemm <peter@FreeBSD.org>

Fix leftover argument to pmap_unuse_pt(). I committed the wrong diff.

Submmitted by: Jon Noack <noackjr@alumni.rice.edu>


# 654bd0e8 29-Jun-2004 Peter Wemm <peter@FreeBSD.org>

Reduce the size of pv entries by 15%. This saves 1MB of KVA for mapping
pv entries per 1GB of user virtual memory. (eg: if we had 1GB file was
mmaped into 30 processes, that would theoretically reduce the KVA demand by
30MB for pv entries. In reality though, we limit pv entries so we don't
have that many at once.)

We used to store the vm_page_t for the page table page. But we recently
had the pa of the ptp, or can calculate it fairly quickly. If we wanted
to avoid the shift/mask operation in pmap_pde(), we could recover the
pa but that means we have to store it for a while.

This does not measurably change performance.

Suggested by: alc
Tested by: alc


# df68e345 26-Jun-2004 Alan Cox <alc@FreeBSD.org>

In case pmap_extract_and_hold() is ever performed on a different pmap than
the current one, we need to pin the current thread to its CPU.

Submitted by: tegge@


# 9eb31321 22-Jun-2004 Alan Cox <alc@FreeBSD.org>

Implement the protection check required by the pmap_extract_and_hold()
specification. This enables the elimination of Giant from that function.

Reviewed by: tegge@


# dc8beb53 20-Jun-2004 Alan Cox <alc@FreeBSD.org>

- Simplify pmap_remove_pages(), eliminating unnecessary indirection.
- Simplify the locking of pmap_is_modified() by converting control flow to
data flow.


# 1ec4b759 20-Jun-2004 Alan Cox <alc@FreeBSD.org>

Add pmap locking to pmap_is_prefaultable().


# 785f2cdf 19-Jun-2004 Alan Cox <alc@FreeBSD.org>

Remove unused pt_entry_ts. Remove an unneeded semicolon.


# d45f21f3 17-Jun-2004 Alan Cox <alc@FreeBSD.org>

Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object. Thus, PG_BUSY serves no purpose.


# 4d831945 16-Jun-2004 Alan Cox <alc@FreeBSD.org>

MFamd64
Introduce pmap locking to many of the pmap functions.


# 1e82a3d1 15-Jun-2004 Alan Cox <alc@FreeBSD.org>

MFamd64
Remove dead or unneeded code, e.g., spl calls.


# 7b9d4744 15-Jun-2004 Alan Cox <alc@FreeBSD.org>

Remove a stale comment.


# 7881f950 13-Jun-2004 Alan Cox <alc@FreeBSD.org>

Prevent the loss of a PG_M bit through an SMP race in pmap_ts_referenced().


# 2d0dc0fc 12-Jun-2004 Alan Cox <alc@FreeBSD.org>

In a multiprocessor, the PG_W bit in the pte must be changed atomically.
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.

Reviewed by: tegge@


# 662d471d 28-May-2004 Alan Cox <alc@FreeBSD.org>

Remove a broken micro-optimization from pmap_enter(). The ill effect
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.

Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.

For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.

Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week


# 0af0eeac 10-Apr-2004 Alan Cox <alc@FreeBSD.org>

- pmap_kenter_temporary()'s first parameter, which is a physical address,
should be declared as vm_paddr_t not vm_offset_t.


# c8607538 04-Apr-2004 Alan Cox <alc@FreeBSD.org>

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


# bdb93eb2 04-Apr-2004 Alan Cox <alc@FreeBSD.org>

Remove unused arguments from pmap_init().


# fcffa790 07-Mar-2004 Alan Cox <alc@FreeBSD.org>

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


# 8600a412 01-Feb-2004 Alan Cox <alc@FreeBSD.org>

Eliminate all TLB shootdowns by pmap_pte_quick(): By temporarily pinning
the thread that calls pmap_pte_quick() and by virtue of the page queues
lock being held, we can manage PADDR1/PMAP1 as a CPU private mapping.

The most common effect of this change is to reduce the overhead of the page
daemon on multiprocessors.

In collaboration with: tegge


# f67fdc73 25-Jan-2004 Jeff Roberson <jeff@FreeBSD.org>

- Now that both schedulers support temporary cpu pinning use this rather
than the switchin functions to guarantee that we're operating with the
correct tlb entry.
- Remove the post copy/zero tlb invalidations. It is faster to invalidate
an entry that is known to exist and so it is faster to invalidate after
use. However, some architectures implement speculative page table
prefetching so we can not be guaranteed that the invalidated entry is still
invalid when we re-enter any of these functions. As a result of this we
must always invalidate before use to be safe.


# 7fac6d07 10-Jan-2004 Alan Cox <alc@FreeBSD.org>

Include "opt_cpu.h" and related #ifdef's for SSE so that pagezero()
actually includes the call to sse2_pagezero().


# a41c6c2a 28-Dec-2003 Alan Cox <alc@FreeBSD.org>

Don't bother clearing PG_ZERO on the page table page in _pmap_allocpte();
it serves no purpose.


# 8eaddab1 27-Dec-2003 Alan Cox <alc@FreeBSD.org>

Don't bother clearing and setting PG_BUSY on page table directory pages.


# 925692ca 21-Dec-2003 Alan Cox <alc@FreeBSD.org>

- Significantly reduce the number of preallocated pv entries in
pmap_init(). Such a large preallocation is unnecessary and wastes
nearly eight megabytes of kernel virtual address space per gigabyte
of managed physical memory.
- Increase UMA_BOOT_PAGES by two. This enables the removal of
pmap_pv_allocf(). (Note: this function was only used during
initialization, specifically, after pmap_init() but before
pmap_init2(). During pmap_init2(), a new allocator is installed.)


# 1c0e8644 18-Dec-2003 John Baldwin <jhb@FreeBSD.org>

MFamd64: Remove i386_protection_init() and the protection_codes[] array
and replace them with a simple if test to turn on PG_RW. i386 != vax.


# 5e4a2fc9 07-Nov-2003 Alan Cox <alc@FreeBSD.org>

- Similar to post-PAE RELENG_4 split pmap_pte_quick() into two cases,
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().

In collaboration with: tegge


# 147ad8d5 03-Nov-2003 John Baldwin <jhb@FreeBSD.org>

New i386 SMP code:

- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.


# cd7ccabe 31-Oct-2003 John Baldwin <jhb@FreeBSD.org>

For physical address regions between 0 and KERNLOAD, allow pmap_mapdev()
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.

Reviewed by: peter


# d49aa135 30-Oct-2003 Peter Wemm <peter@FreeBSD.org>

Change the pmap_invalidate_xxx() functions so they test against
pmap == kernel_pmap rather than pmap->pm_active == -1. gcc's inliner
can remove more code that way. Only kernel_pmap has a pm_active of -1.


# 2e81e660 27-Oct-2003 John Baldwin <jhb@FreeBSD.org>

Fix pmap_unmapdev() to call pmap_kremove() instead of implementing it
directly so that it more closely mirrors pmap_mapdev() which calls
pmap_kenter().


# f3075be8 25-Oct-2003 Peter Wemm <peter@FreeBSD.org>

For the SMP case, flush the TLB at the beginning of the page zero/copy
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.


# b09a77f5 24-Oct-2003 Peter Wemm <peter@FreeBSD.org>

GC workaround code for detecting pentium4's and disabling PSE and PG_G.
It's been ifdef'ed out for ages.


# 6f366f87 14-Oct-2003 Peter Wemm <peter@FreeBSD.org>

Get some more data if we hit the pmap_enter() thing.


# 50ac3f99 12-Oct-2003 Alan Cox <alc@FreeBSD.org>

- Modify pmap_is_current() to return FALSE when a pmap's page table is in
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.

In collaboration with: tegge


# 9b993f82 12-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Initialize CMAP3 to 0


# ab87e2fb 04-Oct-2003 Alan Cox <alc@FreeBSD.org>

Don't bother setting a page table page's valid field. It is unused and
not setting it is consistent with other uses of VM_ALLOC_NOOBJ pages.


# 47804290 04-Oct-2003 Jeff Roberson <jeff@FreeBSD.org>

- The proper test is CPU_ENABLE_SSE and not CPU_ENABLED_SSE. This
effectively disabled the sse2_pagezero() code.

Spotted by: bde


# 566526a9 03-Oct-2003 Alan Cox <alc@FreeBSD.org>

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


# 6ccf265b 01-Oct-2003 Peter Wemm <peter@FreeBSD.org>

Commit Bosko's patch to clean up the PSE/PG_G initialization to and
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.

For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.

A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.

Obtained from: bmilekic


# 1419773d 30-Sep-2003 Jeff Roberson <jeff@FreeBSD.org>

- Hide more #ifdef logic in a new invlcaddr inline. This function flushes
the full tlb if you're on an I386or does an invlpg otherwise.

Glanced at by: peter


# 043407f8 30-Sep-2003 Jeff Roberson <jeff@FreeBSD.org>

- Define an inline pagezero() to select the appropriate full-page zeroing
function from one of bzero, i686_pagezero, or sse2_pagezero.
- Use pagezero() in the three pmap functions that need to zero full pages.


# fb9bde2d 30-Sep-2003 Jeff Roberson <jeff@FreeBSD.org>

- Correct a problem with the last commit. The CMAP ptes need to be zeroed
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.

Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot


# fa3f9daa 30-Sep-2003 Jeff Roberson <jeff@FreeBSD.org>

- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.


# 4487ff65 26-Sep-2003 Alan Cox <alc@FreeBSD.org>

Addendum to the previous revision: If vm_page_alloc() for the page
table page fails, perform a VM_WAIT; update some comments in
_pmap_allocpte().


# f3fd831c 24-Sep-2003 Alan Cox <alc@FreeBSD.org>

- Eliminate the pte object.
- Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
KVA space for the page directory page(s). Submitted by: tegge


# be19fdd1 21-Sep-2003 Alan Cox <alc@FreeBSD.org>

Allocate the page table directory page(s) as "no object" pages. (This
leaves one explicit use of the pte object.)


# f8363bde 20-Sep-2003 Alan Cox <alc@FreeBSD.org>

Reimplement pmap_release() such that it uses the page table rather than the
pte object to locate the page table directory pages. (This is another step
toward the elimination of the pte object.)


# 6d66d714 13-Sep-2003 Alan Cox <alc@FreeBSD.org>

Simplify (and micro-optimize) pmap_unuse_pt(): Only one caller,
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().


# b9850eb2 12-Sep-2003 Alan Cox <alc@FreeBSD.org>

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


# ba2157f2 07-Sep-2003 Alan Cox <alc@FreeBSD.org>

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


# a7b60ab2 25-Aug-2003 David E. O'Brien <obrien@FreeBSD.org>

Fix copyright comment & FBSDID style nits.

Requested by: bde


# 3c5a69f7 21-Aug-2003 Alan Cox <alc@FreeBSD.org>

Eliminate the last (direct) use of vm_page_lookup() on the pte object.


# 1e584e46 19-Aug-2003 Alan Cox <alc@FreeBSD.org>

Eliminate a possible race condition for multithreaded applications in
_pmap_allocpte(): Guarantee that the page table page is zero filled before
adding it to the directory. Otherwise, a 2nd, 3rd, etc. thread could
access a nearby virtual address and use garbage for the address
translation.

Discussed with: peter, tegge


# 90ca070d 17-Aug-2003 Alan Cox <alc@FreeBSD.org>

Acquire the pte object's mutex when performing vm_page_grab(). Note: It is
my long-term objective to eliminate the pte object. In the near term, this
does, however, enable the addition of some vm object locking assertions.


# 365b27ea 16-Aug-2003 Alan Cox <alc@FreeBSD.org>

In pmap_copy(), since we have the page table page's physical address
in hand, use PHYS_TO_VM_PAGE() rather than vm_page_lookup().


# d8df7ab7 13-Aug-2003 Alan Cox <alc@FreeBSD.org>

Eliminate pmap_page_lookup() and its uses. Instead, use PHYS_TO_VM_PAGE()
to convert the pte's physical address into a vm page.

Reviewed by: peter


# ba97fd8a 10-Aug-2003 Alan Cox <alc@FreeBSD.org>

Rename pmap_changebit() to pmap_clear_ptes() and remove the last
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.

Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.


# 2c2464cb 06-Aug-2003 Alan Cox <alc@FreeBSD.org>

Correct a mistake in the previous revision: Reduce the scope of the page
queues lock such that it isn't held around the call to get_pv_entry(),
which calls uma_zalloc(). At the point of the call to get_pv_entry(), the
lock isn't necessary and holding it could lead to recursive acquisition,
which isn't allowed.


# b0b2803a 06-Aug-2003 Alan Cox <alc@FreeBSD.org>

Acquire the page queues lock in pmap_insert_entry(). (I used to believe
that the page's busy flag could be relied upon to synchronize access to the
pv list. I don't any longer. See, for example, the call to
pmap_insert_entry() from pmap_copy().)


# 195d68e5 02-Aug-2003 Alan Cox <alc@FreeBSD.org>

- Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in
pmap_mapdev(). See revision 1.140 of kern/sys_pipe.c for a detailed
rationale. Submitted by: tegge
- Remove GIANT_REQUIRED from pmap_mapdev().


# b053bc84 30-Jul-2003 Bosko Milekic <bmilekic@FreeBSD.org>

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


# 34621500 23-Jul-2003 Alan Cox <alc@FreeBSD.org>

Annotate pmap_changebit() as __always_inline. This function was
written as a template that when inlined is specialized for the caller
through constant value propagation and dead code elimination. Thus,
the specialized code that is generated for pmap_clear_reference() et
al. avoids several conditional branches inside of a loop.


# e95babf3 09-Jul-2003 Peter Wemm <peter@FreeBSD.org>

unifdef -DLAZY_SWITCH and start to tidy up the associated glue.


# 90a7c7b6 08-Jul-2003 Alan Cox <alc@FreeBSD.org>

In pmap_object_init_pt(), the pmap_invalidate_all() should be performed on
the caller-provided pmap, not the kernel_pmap. Using the kernel_pmap
results in an unnecessary IPI for TLB shootdown on SMPs.

Reviewed by: jake, peter


# 4041408f 04-Jul-2003 Alan Cox <alc@FreeBSD.org>

Add vm object locking to pmap_prefault().


# 1f78f902 03-Jul-2003 Alan Cox <alc@FreeBSD.org>

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


# dca96f1a 29-Jun-2003 Alan Cox <alc@FreeBSD.org>

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


# eabd1972 27-Jun-2003 Peter Wemm <peter@FreeBSD.org>

Tidy up leftover lazy_switch instrumentation that is no longer needed.
This cleans up some #ifdef hell.


# b50953cc 27-Jun-2003 Peter Wemm <peter@FreeBSD.org>

Fix the false IPIs on smp when using LAZY_SWITCH caused by pmap_activate()
not releasing the pm_active bit in the old pmap.


# 0e2a4d3a 14-Jun-2003 David Xu <davidxu@FreeBSD.org>

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


# 49a2507b 14-Jun-2003 Alan Cox <alc@FreeBSD.org>

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


# 89f4fca2 14-Jun-2003 Alan Cox <alc@FreeBSD.org>

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


# b26ce6a4 13-Jun-2003 Alan Cox <alc@FreeBSD.org>

Add vm object locking to pmap_object_init_pt().


# 8630c117 12-Jun-2003 Alan Cox <alc@FreeBSD.org>

Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.


# 9676a785 02-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 14ce5bd4 28-Apr-2003 Jake Burkholder <jake@FreeBSD.org>

Use inlines for loading and storing page table entries. Use cmpxchg8b for
the PAE case to ensure idempotent 64 bit loads and stores.

Sponsored by: DARPA, Network Associates Laboratories


# ffad008f 25-Apr-2003 Jake Burkholder <jake@FreeBSD.org>

Remove harmless invalid cast.

Sponsored by: DARPA, Network Associates Laboratories


# 163529c2 03-Apr-2003 Jake Burkholder <jake@FreeBSD.org>

- Removed APTD and associated macros, it is no longer used.

BANG BANG BANG etc.

Sponsored by: DARPA, Network Associates Laboratories


# cc66ebe2 02-Apr-2003 Peter Wemm <peter@FreeBSD.org>

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


# 7ab9b220 29-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

- Add support for PAE and more than 4 gigs of ram on x86, dependent on the
kernel opition 'options PAE'. This will only work with device drivers which
either use busdma, or are able to handle 64 bit physical addresses.

Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine
with 6 gigs of ram.

Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems


# de54353f 29-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

- Remove invalid casts.

Sponsored by: DARPA, Network Associates Laboratories


# aea57872 29-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

- Convert all uses of pmap_pte and get_ptbase to pmap_pte_quick. When
accessing an alternate address space this causes 1 page table page at
a time to be mapped in, rather than using the recursive mapping technique
to map in an entire alternate address space. The recursive mapping
technique changes large portions of the address space and requires global
tlb flushes, which seem to cause problems when PAE is enabled. This will
also allow IPIs to be avoided when mapping in new page table pages using
the same technique as is used for pmap_copy_page and pmap_zero_page.

Sponsored by: DARPA, Network Associates Laboratories


# 227f9a1c 24-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


# 4d3f408c 12-Mar-2003 Jake Burkholder <jake@FreeBSD.org>

- Added support for multiple page directory pages to pmap_pinit and
pmap_release.
- Merged pmap_release and pmap_release_free_page. When pmap_release is
called only the page directory page(s) can be left in the pmap pte object,
since all page table pages will have been freed by pmap_remove_pages and
pmap_remove. In addition, there can only be one reference to the pmap and
the page directory is wired, so the page(s) can never be busy. So all there
is to do is clear the magic mappings from the page directory and free the
page(s).

Sponsored by: DARPA, Network Associates Laboratories


# ac2e4153 26-Feb-2003 Julian Elischer <julian@FreeBSD.org>

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


# 0f1a7e05 25-Feb-2003 Jake Burkholder <jake@FreeBSD.org>

- Added inlines pmap_is_current, pmap_is_alternate and pmap_set_alternate
for testing and setting the current and alternate address spaces.
- Changed PTDpde and APTDpde to arrays to support multiple page directory
pages.

ponsored by: DARPA, Network Associates Laboratories


# 07159f9c 24-Feb-2003 Maxime Henrion <mux@FreeBSD.org>

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


# 28c9e1aa 23-Feb-2003 Jake Burkholder <jake@FreeBSD.org>

Use the direct mapping of IdlePTD setup in locore for proc0's page directory,
instead of allocating another page of kva and mapping it in again. This was
likely an oversight in revision 1.174 (cut and paste from pmap_pinit).

Discussed with: peter, tegge
Sponsored by: DARPA, Network Associates Laboratories


# 910548de 23-Feb-2003 Jake Burkholder <jake@FreeBSD.org>

- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
There are still numerous assumptions that the page directory is exactly
1 page.

Sponsored by: DARPA, Network Associates Laboratories


# e29632c9 23-Feb-2003 Jake Burkholder <jake@FreeBSD.org>

- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants
in locore.
- Removed the macros PTESIZE and PDESIZE, use sizeof instead in C.

Sponsored by: DARPA, Network Associates Laboratories


# 01a06ce2 22-Feb-2003 Alan Cox <alc@FreeBSD.org>

The root of the splay tree maintained within the pm_pteobj always refers
to the last accessed pte page. Thus, the pm_ptphint is redundant and can
be removed.


# 8e42580d 15-Feb-2003 Alan Cox <alc@FreeBSD.org>

Assert that the kernel map's system mutex is held in pmap_growkernel().


# e33d37b6 14-Feb-2003 Alan Cox <alc@FreeBSD.org>

- Add a mutex for synchronizing the use of CMAP/CADDR 1 and 2.
- Eliminate small style differences between pmap_zero_page(),
pmap_copy_page(), etc.


# 939a4397 12-Feb-2003 Peter Wemm <peter@FreeBSD.org>

Oops. I mis-remembered about the P4 problems. It was 5.0-DP2 that
was shipped with DISABLE_PG_G and DISABLE_PSE, not 5.0-REL. *blush*
Disable the code - but still leave it there in case its still lurking.


# 521871f1 12-Feb-2003 Peter Wemm <peter@FreeBSD.org>

Turn of PG_PS and PG_G for Pentium-4 cpus at boot time. This is so
that we can stop turning off PG_G and PG_PS globally for releases.


# 393a225c 11-Feb-2003 Alan Cox <alc@FreeBSD.org>

Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


# 571cd8a1 07-Feb-2003 Alan Cox <alc@FreeBSD.org>

MF alpha
- Synchronize access to the allpmaps list with a mutex.


# e2294003 06-Feb-2003 Peter Wemm <peter@FreeBSD.org>

Commit some cosmetic changes I had laying around and almost included
with another commit. Unwrap a line. Unexpand a pmap_kenter().


# ca380469 02-Feb-2003 Alan Cox <alc@FreeBSD.org>

- Make allpmaps static.
- Use atomic subtract to update the global wired pages count. (See
also vm/vm_page.c revision 1.233.)
- Assert that the page queue lock is held in pmap_remove_entry().


# d6d92c84 27-Jan-2003 Alan Cox <alc@FreeBSD.org>

Merge pmap_testbit() and pmap_is_modified(). The latter is the only caller
of the former.


# 9d5abbdd 01-Jan-2003 Jens Schweikhardt <schweikh@FreeBSD.org>

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


# 84cdcd85 27-Dec-2002 Alan Cox <alc@FreeBSD.org>

Assert that the page queues lock is held in pmap_testbit().


# 11a2911c 24-Dec-2002 Alan Cox <alc@FreeBSD.org>

- Hold the page queues lock around calls to vm_page_wakeup() and
vm_page_flag_clear().


# 0ced3981 14-Dec-2002 Alan Cox <alc@FreeBSD.org>

Add page locking to pmap_mincore().

Submitted (in part) by: tjr@


# f4dcf955 02-Dec-2002 Alan Cox <alc@FreeBSD.org>

Avoid recursive acquisition of the page queues lock in pmap_unuse_pt().

Approved by: re


# d79ebb60 01-Dec-2002 Alan Cox <alc@FreeBSD.org>

Hold the page queues lock when calling pmap_unwire_pte_hold() or
pmap_remove_pte(). Use vm_page_sleep_if_busy() in
_pmap_unwire_pte_hold() so that the page queues lock is released
when sleeping.

Approved by: re (blanket)


# e6c90801 30-Nov-2002 Alan Cox <alc@FreeBSD.org>

Assert that the page queues lock is held in pmap_changebit()
and pmap_ts_referenced().

Approved by: re (blanket)


# 0d51d232 30-Nov-2002 Alan Cox <alc@FreeBSD.org>

Assert that the page queues lock is held in pmap_page_exists_quick().

Approved by: re (blanket)


# ffb30958 24-Nov-2002 Alan Cox <alc@FreeBSD.org>

Assert that the page queues lock is held in pmap_remove_pages().

Approved by: re (blanket)


# 4817d8e5 22-Nov-2002 Alan Cox <alc@FreeBSD.org>

- Assert that the page queues lock is held in pmap_remove_all().
- Fix a diagnostic message and comment in pmap_remove_all().
- Eliminate excessive white space from pmap_remove_all().

Approved by: re


# eea85e9b 12-Nov-2002 Alan Cox <alc@FreeBSD.org>

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


# 6372d61e 10-Nov-2002 Alan Cox <alc@FreeBSD.org>

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


# aa8e11b6 07-Nov-2002 Alan Cox <alc@FreeBSD.org>

Simplify and optimize pmap_object_init_pt(). More specifically,
take advantage of the fact that the vm object's list of pages is
now ordered to reduce the overhead of finding the desired set of
pages to be mapped. (See revision 1.215 of vm/vm_page.c.)


# 316ec49a 02-Oct-2002 Scott Long <scottl@FreeBSD.org>

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


# b29d22f9 01-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

The pmap_prefault_pageorder[] array was initialize with wrong values
due to a missing comma.

I have no idea what trouble, if any, this may have caused.

Pointed out by: FlexeLint


# 37c84183 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 6508a194 24-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


# 55f7c614 21-Aug-2002 Archie Cobbs <archie@FreeBSD.org>

Don't use "NULL" when "0" is really meant.


# fe047604 17-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Simplify the ptphint test in pmap_release_free_page(). In other words,
make it just like the test in _pmap_unwire_pte_hold().


# e9ed460a 13-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Remove an unnecessary vm_page_flash() from _pmap_unwire_pte_hold().

Reviewed by: peter


# d837b369 12-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Convert three instances of vm_page_sleep_busy() into vm_page_sleep_if_busy()
with page queue locking.


# d8a0d079 12-Aug-2002 Ian Dowse <iedowse@FreeBSD.org>

Use roundup2() to avoid a problem where pmap_growkernel was unable
to extend the kernel VM to the maximum possible address of 4G-4M.

PR: i386/22441
Submitted by: Bill Carpenter <carp@world.std.com>
Reviewed by: alc


# 0da73705 10-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)


# c679b309 05-Aug-2002 Peter Wemm <peter@FreeBSD.org>

Revert rev 1.356 and 1.352 (pmap_mapdev hacks). It wasn't worth the
pain.


# 3f3655b0 04-Aug-2002 Peter Wemm <peter@FreeBSD.org>

Fix a mistake in 1.352 - I was returning a pointer to the rounded down
address. I expect this will fix acpica.


# ea5e5b13 03-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Request a wired page from vm_page_grab() in _pmap_allocpte().


# b9c51c91 03-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Ask for a prezeroed page in pmap_pinit() for the page directory page.


# 5da2d6a4 03-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Don't set PG_MAPPED on the page allocated and mapped in _pmap_allocpte().
(Only set this flag if the mapping has a corresponding pv list entry,
which this mapping doesn't.)


# 8f1586dd 02-Aug-2002 Peter Wemm <peter@FreeBSD.org>

Take advantage of the fact that there is a small 1MB direct mapped region
on x86 in between KERNBASE and the kernel load address. pmap_mapdev()
can return pointers to this for devices operating in the isa "hole".


# 64a1b85e 01-Aug-2002 Alan Cox <alc@FreeBSD.org>

o Lock page queue accesses by vm_page_deactivate().


# 239b5b97 31-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped
by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.

Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.


# bfd28670 30-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Lock page queue accesses by pmap_release_free_page().


# 14f8ceaa 28-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire()
in pmap_new_thread(), pmap_pinit(), and vm_proc_new().
o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().


# e344afe7 20-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat
gathering is not an x86 cpu feature)


# 4aca0b15 19-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Use vm_page_alloc(... | VM_ALLOC_WIRED) in place of vm_page_wire().


# d08c48b4 17-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Avoid trying to set PG_G on the first 4MB when we set up the 4MB page.
This solves the SMP panic for at least one system. I'd still like to know
why my xeon works though.

Tested by: bmilekic


# 700399bc 14-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Lock page queue accesses by vm_page_wire().


# 5c5e3622 13-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Two invlpg's slipped through that were not protected from I386_CPU

Pointed out by: dillon


# 96fd5002 13-Jul-2002 Peter Wemm <peter@FreeBSD.org>

invlpg() does not work too well on i386 cpus. Add token i386 support
back in to the pmap_zero_page* stuff.


# 00649044 13-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Do global shootdowns when switching to/from 4MB pages. I believe we can
do a shootdown on a 4MB "page" though, but this should be safer for now.

Noticed by: tegge


# a7b1f16c 13-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Bandaid for SMP. Changing APTDpde without a global shootdown is not
safe yet. We used to do a global shootdown here anyway so another day
or so shouldn't hurt.


# 753492f4 13-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Lock some page queue accesses, in particular, those by vm_page_unwire().


# fbcf77c2 12-Jul-2002 Matthew Dillon <dillon@FreeBSD.org>

Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit

Reviewed by: peter, jhb
Approved by: peter


# f1b665c8 12-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.

New option: DISABLE_PG_G - In case I missed something.


# 4a0226c6 11-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Unexpand a couple of 8-space indents that I added in rev 1.285.


# a58b3a68 07-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


# 64ca7010 07-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Fix a hideous TLB bug. pmap_unmapdev neglected to remove the device
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.


# a136efe9 07-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


# 3a7ef791 04-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Diff reduction (microoptimization) with another WIP. Move the frame
calculation in get_ptbase() to a little later on.


# 8692e184 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Don't free pages we never allocated..

My eyes openned by: Matt


# 0d6735c6 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Slight restatement of the code and remove some unused variables.


# e04d8bf8 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Add comments and slightly rearrange the thread stack assignment code
to try make it less obscure.


# b2adb4b2 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Remove vestiges of old code...
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))


# e602ba25 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# c6d84b4d 27-Jun-2002 Andrew R. Reiter <arr@FreeBSD.org>

Fix for the problem stated below by Tor Egge:
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)

"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."

Submitted by: tegge
Approved by: tegge's patches never fail


# 23f09d50 26-Jun-2002 Ian Dowse <iedowse@FreeBSD.org>

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


# 6395da54 25-Jun-2002 Ian Dowse <iedowse@FreeBSD.org>

Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().


# 18aa2de5 17-Jun-2002 Jeff Roberson <jeff@FreeBSD.org>

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


# db17c6fc 29-Apr-2002 Peter Wemm <peter@FreeBSD.org>

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


# 1a87a0da 15-Apr-2002 Peter Wemm <peter@FreeBSD.org>

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


# 46d0abf3 20-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

Remove references to vm_zone.h and switch over to the new uma API.


# 15fe3067 20-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 8355f576 19-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# f4e18c9a 28-Feb-2002 Mike Silbersack <silby@FreeBSD.org>

Fix a minor swap leak.

Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.

Submitted by: dillon
Reviewed by: silby
MFC after: 1 week


# 7f3a4093 27-Feb-2002 Mike Silbersack <silby@FreeBSD.org>

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


# d1693e17 27-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


# 5c004dc6 26-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Bandaid for the Uniprocessor kernel exploding. This makes a UP kernel
boot and run (and indeed I am committing from it) instead of exploding
during the int 0x15 call from inside the atkbd driver to get the keyboard
repeat rates.


# b7eeb587 26-Feb-2002 Alfred Perlstein <alfred@FreeBSD.org>

clarify panic message


# bd1e3a0f 26-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.

Obtained from: jake (MI code, suggestions for MD part)


# 9f34c416 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

didn't quite undo the last reversion. This gets it.


# 08b38b1f 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

revert compatibility fix temporarily (thought it would not break anything
leaving it in).


# 24e68cb0 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

Make peter's commit compatible with interrupt-enabled critical_enter()
and exit(), which has already solved the problem in regards to deadlocked
IPI's.


# 6bd95d70 25-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.


# 98f1484c 20-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Pass me the pointy hat please. Be sure to return a value in a non-void
function. I've been running with this buried in the mountains of compiler
output for about a month on my desktop.


# 6a3e90ef 19-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Some more tidy-up of stray "unsigned" variables instead of p[dt]_entry_t
etc.


# ead8168a 05-Jan-2002 Peter Wemm <peter@FreeBSD.org>

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


# 7ff48af7 02-Jan-2002 Peter Wemm <peter@FreeBSD.org>

Allow a specific setting for pv entries. This avoids the need to guess
(or calculate by hand) the effect of interactions between shpgperproc,
physical ram size, maxproc, maxdsiz, etc.


# 587bd8bf 31-Dec-2001 Matthew Dillon <dillon@FreeBSD.org>

Grrr. The tlb code is strewn over 3 files and I misread it. Revert
the last change (it was a NOP), and remove the XXX comments that no longer
apply.


# b00dcfdf 31-Dec-2001 Matthew Dillon <dillon@FreeBSD.org>

You know those 'XXX what about SMP' comments in pmap_kenter()? Well,
they were right. Fix both kenter() and kremove() for SMP by ensuring that
the tlb is flushed on other cpu's. This will directly solve random-corruption
panic issues in -stable when it is MFC'd. Better to be safe then sorry, we
can optimize this later.

Original Suspicion by: peter
Maybe MFC: immediately on re's permission


# ff5a52e1 19-Dec-2001 Peter Wemm <peter@FreeBSD.org>

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


# c04cbb47 19-Dec-2001 Peter Wemm <peter@FreeBSD.org>

Fix some whitespace nits, and a minor error that I made in some unused
#ifdef DEBUG code (VM_MAXUSER_ADDRESS vs UPT_MAX_ADDRESS).


# 0bbc8826 11-Dec-2001 John Baldwin <jhb@FreeBSD.org>

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


# eaef7150 10-Dec-2001 Peter Wemm <peter@FreeBSD.org>

Delete some leftover code from a bygone age. We dont have an array of
IdlePTDS anymore and dont to the PTD[MPPTDI] swapping etc.


# 720c992f 16-Nov-2001 Peter Wemm <peter@FreeBSD.org>

Fix the non-KSTACK_GUARD case.. It has been broken since the KSE
commit. ptek was not been initialized.


# 6729cb88 16-Nov-2001 Peter Wemm <peter@FreeBSD.org>

Start bringing i386/pmap.c into line with cleanups that were done to
alpha pmap. In particular -
- pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer.
This is to enable cleaner PAE and x86-64 support down the track sor
that we can change the pd_entry_t/pt_entry_t types to 64 bit entities.
- Terminate "unsigned *ptep, pte" with extreme prejudice and use the
correct pt_entry_t/pd_entry_t types.
- Various other cosmetic changes to match cleanups elsewhere.
- This eliminates a boatload of casts.
- use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places
where we're testing user address space limits. Assuming the page tables
start directly after the end of user space is not a safe assumption.
There is still more to go.


# ab29f876 15-Nov-2001 Peter Wemm <peter@FreeBSD.org>

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


# 8ad88132 15-Nov-2001 Peter Wemm <peter@FreeBSD.org>

Converge/fix some debug code (#if 0'ed on alpha, but whatever)
- use NPTEPG/NPDEPG instead of magic 1024 (important for PAE)
- use pt_entry_t instead of unsigned (important for PAE)
- use vm_offset_t instead of unsigned for va's (important for x86-64)


# e258f08a 31-Oct-2001 Peter Wemm <peter@FreeBSD.org>

Skip PG_UNMANAGED pages when we're shooting everything down to try and
reclaim pv_entries. PG_UNMANAGED pages dont have pv_entries to reclaim.

Reported by: David Xu <davidx@viasoft.com.cn>


# e3026983 30-Oct-2001 Matthew Dillon <dillon@FreeBSD.org>

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


# 23340918 14-Oct-2001 Tor Egge <tegge@FreeBSD.org>

Reduce the number of TLB shootdowns caused by a call to pmap_qenter()
from number of pages mapped to 1.

Reviewed by: dillon


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 43295941 30-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Do a style cleanup pass for the pmap_{new,dispose,etc}_proc() functions
to get them closer to the KSE tree. I will do the other $machine/pmap.c
files shortly.


# 15dac10b 24-Aug-2001 Julian Elischer <julian@FreeBSD.org>

Add another comment.
check for 'teh's this time..


# 268bdb43 24-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.


# 219d632c 21-Aug-2001 Matthew Dillon <dillon@FreeBSD.org>

Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by: freebsd-smp, jake


# 3d6fde76 21-Aug-2001 Peter Wemm <peter@FreeBSD.org>

Introduce two new sysctl's.. vm.kvm_size and vm.kvm_free. These are
purely informational and can give some advance indications of tuning
problems. These are i386 only for now as it seems that the i386 is
the only one suffering kvm pressure.


# 0b27d710 26-Jul-2001 Peter Wemm <peter@FreeBSD.org>

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


# 0cddd8f0 04-Jul-2001 Matthew Dillon <dillon@FreeBSD.org>

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 1f50e112 23-May-2001 Alfred Perlstein <alfred@FreeBSD.org>

pmap_mapdev needs the vm_mtx, aquire it if not already locked


# 9dceb26b 21-May-2001 John Baldwin <jhb@FreeBSD.org>

Sort includes.


# 23955314 18-May-2001 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 1005a129 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 50e2347e 14-Mar-2001 Peter Wemm <peter@FreeBSD.org>

Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section. If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping. This would cause the trampoline
up to high memory to fault. The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>


# 136d8f42 06-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


# 4a01ebd4 06-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


# 968950e5 05-Mar-2001 John Baldwin <jhb@FreeBSD.org>

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


# f1532aad 22-Feb-2001 Peter Wemm <peter@FreeBSD.org>

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


# 7ad1d369 29-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Remove unnecessary locking to protect the p_upages_obj and p_addr
pointers.


# 16fdce53 24-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- Proc locking.
- P_INMEM -> PS_INMEM.


# a3ea6d41 21-Jan-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


# 3e899e10 20-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.


# e44a0ea3 16-Jan-2001 Peter Wemm <peter@FreeBSD.org>

Stop doing runtime checking on i386 cpus for cpu class. The cpu is
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.

Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)


# ef73ae4b 09-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


# c0c25570 12-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 553629eb 22-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# b5d335ad 07-Nov-2000 Alfred Perlstein <alfred@FreeBSD.org>

Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.

Found by: Joost Pol aka Nohican <nohican@marcella.niets.org>
Submitted by: tegge


# c794ceb5 17-Oct-2000 Paul Saab <ps@FreeBSD.org>

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


# 12e8a79c 05-Oct-2000 John Baldwin <jhb@FreeBSD.org>

Replace loadandclear() with atomic_readandclear_int().


# 7321545f 22-Sep-2000 Paul Saab <ps@FreeBSD.org>

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


# 300019c4 19-Sep-2000 Eivind Eklund <eivind@FreeBSD.org>

Better error message when booting an SMP kernel on an UP system.


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# d9b05734 16-Aug-2000 Tor Egge <tegge@FreeBSD.org>

Prepare for a cleanup of pmap module API pollution introduced by the
suggested fix in PR 12378.

Keep track of all existing pmaps independent of existing processes.

This allows for a process to temporarily connect to a different address
space without the risk of missing an update of the original address space if
the kernel grows.

pmap_pinit2() is no longer needed on the i386 platform but is left as a
stub until the alpha pmap code is updated.

PR: 12378


# 8b03c8ed 29-May-2000 Matthew Dillon <dillon@FreeBSD.org>

This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it. It implements a new
PG_UNMANAGED flag that has slightly different characteristics
from PG_FICTICIOUS.

A new sysctl, kern.ipc.shm_use_phys has been added to enable the
use of physically-backed sysv shared memory rather then swap-backed.
Physically backed shm segments are not tracked with PV entries,
allowing programs which use a large shm segment as a rendezvous
point to operate without eating an insane amount of KVM in the
PV entry management. Read: Oracle.

Peter's OBJT_PHYS object will also allow us to eventually implement
page-table sharing and/or 4MB physical page support for such segments.
We're half way there.


# 1536418a 29-May-2000 Doug Rabson <dfr@FreeBSD.org>

Brucify the pmap_enter_temporary() changes.


# 31891bc2 28-May-2000 Doug Rabson <dfr@FreeBSD.org>

Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon


# 0385347c 20-May-2000 Peter Wemm <peter@FreeBSD.org>

Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by: dillon


# c71d5570 20-Apr-2000 Luoqi Chen <luoqi@FreeBSD.org>

IO apics are not necessarily page aligned, they are only required to be aligned
on 1K boundary. Correct a typo that would cause problem to a second IO apic.

Pointed out by: Steve Passe <smp.csn.net>


# db5f635a 16-Mar-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


# 311b554b 13-Mar-2000 Bruce Evans <bde@FreeBSD.org>

Disabled the optimization of not doing an invltlb_1pg() when changing
pte's from zero. The TLB is supposed to be invalidated when pte's are
changed _to_ zero, but this doesn't occur in all cases for global pages
(PG_G stops invltlb() from working, and invltlb_1pg() is not used
enough).

PR: 14141, 16568
Submitted by: dillon


# 2996376a 19-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Use LIST_FOREACH to traverse the allproc list.

Submitted by: Jake Burkholder jake@checker.org


# 923502ff 29-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 1a16554b 11-Sep-1999 Peter Wemm <peter@FreeBSD.org>

Make pmap_mapdev() deal with non-page-aligned requests.
Add a corresponding pmap_unmapdev() to release the KVM back to kernel_map.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 7308467d 11-Aug-1999 Alan Cox <alc@FreeBSD.org>

_pmap_allocpte:
If the pte page isn't PQ_NONE, panic rather than silently
covering up the problem.


# 7f8d2279 09-Aug-1999 Alan Cox <alc@FreeBSD.org>

pmap_remove_pages:
Add KASSERT to detect out of range access to the pv_table and
report the errant pte before it's overwritten.


# eaf183a8 31-Jul-1999 Alan Cox <alc@FreeBSD.org>

pmap_object_init_pt:
Verify that object != NULL.


# 086d0ae1 30-Jul-1999 Alan Cox <alc@FreeBSD.org>

Add parentheses for clarity.

Submitted by: dillon


# d4da2dba 21-Jul-1999 Alan Cox <alc@FreeBSD.org>

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


# ad8ac923 08-Jul-1999 Kirk McKusick <mckusick@FreeBSD.org>

These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.

* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.

* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.

* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.

* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c

* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.

* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.

* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.

* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.

* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).

* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).

* removal of extra arguments passed to getnewbuf() that are not
necessary.

* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c

* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.

* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.

Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


# 541e0187 23-Jun-1999 Luoqi Chen <luoqi@FreeBSD.org>

Do not setup 4M pdir until all APs are up.


# 21053753 08-Jun-1999 Dmitrij Tejblum <dt@FreeBSD.org>

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
kernel virtual address space for UPAGES.


# 31fdd69a 05-Jun-1999 Luoqi Chen <luoqi@FreeBSD.org>

Fix an accounting problem when prefaulting 4M pages.

PR: kern/11948


# eb9d435a 01-Jun-1999 Jonathan Lemon <jlemon@FreeBSD.org>

Unifdef VM86.

Reviewed by: silence on on -current


# 72e51821 27-May-1999 Alan Cox <alc@FreeBSD.org>

pmap_object_init_pt:
The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.


# 88b67a96 18-May-1999 Alan Cox <alc@FreeBSD.org>

pmap_qremove:
Eliminate unnecessary TLB shootdowns.


# 5206bca1 27-Apr-1999 Luoqi Chen <luoqi@FreeBSD.org>

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 51bb7ba6 25-Apr-1999 Alan Cox <alc@FreeBSD.org>

pmap_dispose_proc and pmap_copy_page:
Conditionally compile 386-specific code.

pmap_enter:
Eliminate unnecessary TLB shootdowns.

pmap_zero_page and pmap_zero_page_area:
Use invltlb_1pg instead of duplicating the code.


# 11a9f83f 23-Apr-1999 Dmitrij Tejblum <dt@FreeBSD.org>

Make pmap_collect() an official pmap interface.


# 270da415 19-Apr-1999 Alan Cox <alc@FreeBSD.org>

_pmap_unwire_pte_hold and pmap_remove_page:
Use pmap_TLB_invalidate instead of invltlb_1pg to eliminate
unnecessary IPIs.

pmap_remove, pmap_protect and pmap_remove_pages:
Use pmap_TLB_invalidate_all instead of invltlb to eliminate
unnecessary IPIs.

pmap_copy:
Use cpu_invltlb instead of invltlb when updating APTDpde.

pmap_changebit:
Rather than deleting the unused "set bit" option (which may be
useful later), make pmap_changebit an inline that is used
by the new pmap_clearbit procedure.

Collectively, the first three changes reduce the number of TLB shootdown
IPIs by 1/3 for a kernel compile.


# 53134efb 09-Apr-1999 Alan Cox <alc@FreeBSD.org>

pmap_remove_pte:
Use "loadandclear" to update the pte.

pmap_changebit and pmap_ts_referenced:
Switch to pmap_TLB_invalidate from invltlb.


# 4ffd949e 06-Apr-1999 Mike Smith <msmith@FreeBSD.org>

mem.c
Split out ioctl handler a little more cleanly, add memory
range attribute handling for both kernel and user-space
consumers.

pmap.c
Remove obsolete P6 MTRR-related code.

i686_mem.c
Map generic memory-range attribute interface to the P6 MTRR
model.


# 47b9dbd4 05-Apr-1999 Alan Cox <alc@FreeBSD.org>

Two changes to pmap_remove_all:

1. Switch to pmap_TLB_invalidate from invltlb, eliminating a full TLB
flush where a single-page flush suffices. (Also, this eliminates some
unnecessary IPIs.)

2. Use "loadandclear" to update the pte, eliminating a race condition
on SMPs.

Change #2 should be committed to -STABLE.


# 8d17e694 05-Apr-1999 Julian Elischer <julian@FreeBSD.org>

Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>


# 087e80a9 02-Apr-1999 Alan Cox <alc@FreeBSD.org>

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


# 10e77073 13-Mar-1999 Alan Cox <alc@FreeBSD.org>

pmap_qenter/pmap_qremove:
Use the pmap_kenter/pmap_kremove inline functions
instead of duplicating them.

pmap_remove_all:
Eliminate an unused (but initialized) variable.

pmap_ts_reference:
Change the implementation. The new implementation is much smaller
and simpler, but functionally identical. (Reviewed by
"John S. Dyson" <dyson@iquest.net>.)


# 901671c0 05-Mar-1999 Alan Cox <alc@FreeBSD.org>

Fix an SMP-only TLB invalidation bug. Specifically, disable
a TLB invalidation optimization that won't work given the
limitations of our current SMP support.

This patch should be applied to -stable ASAP.

Thanks to John Capo <jc@irbs.com>,
Steve Kargl <sgk@troutmask.apl.washington.edu>, and
Chuck Robey <chuckr@mat.net>
for testing.


# b1028ad1 19-Feb-1999 Luoqi Chen <luoqi@FreeBSD.org>

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


# 0a5e03dd 27-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 7dbf82dc 23-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


# 1c7c3c6a 21-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# fd8d7e38 11-Jan-1999 Eivind Eklund <eivind@FreeBSD.org>

Silence warnings by removing unused convenience function and
globalizing debugging functions.


# c197d61d 09-Jan-1999 Dmitrij Tejblum <dt@FreeBSD.org>

Oops --<, replace 1.216 with a version that actually check pv_entries (and
was tested for month or two in production).

Noticed by: Stephen McKay

Stephen also suggested to remove the complication at all. I don't do it as
it would be backout of a large part of 1.190 (from 1998/03/16)...


# fcf37ac3 08-Jan-1999 Luoqi Chen <luoqi@FreeBSD.org>

Allocate kernel page table object (kptobj) before any kmem_alloc calls.
On a system with a large amount of ram (e.g. 2G), allocation of per-page
data structures (512K physical pages) could easily bust the initial kernel
page table (36M), and growth of kernel page table requires kptobj.


# 6143bceb 07-Jan-1999 Dmitrij Tejblum <dt@FreeBSD.org>

Make pmap_ts_referenced check more than 1 pv_entry. (One should be carefull
when move elements to the tail of a list in a loop...)


# f1d19042 07-Dec-1998 Archie Cobbs <archie@FreeBSD.org>

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


# 18830dba 26-Nov-1998 Tor Egge <tegge@FreeBSD.org>

Don't forget to update the pmap associated with aio daemons when adding
new page directory entries for a growing kernel virtual address space.


# 38cc2d93 24-Nov-1998 Eivind Eklund <eivind@FreeBSD.org>

Move the declaration of PPro_vmtrr from the header file to pmap.c,
replacing the one in the header file with a definition. This makes it
easier to work with tools that grok ANSI C only.


# 1916cd29 07-Nov-1998 Mike Smith <msmith@FreeBSD.org>

Enable 686 class optimisations for all 686-class processors, not just the
Pentium Pro. This resolves the "Dog slow SMP" issue for Pentium II
systems.


# 73007561 28-Oct-1998 David Greenman <dg@FreeBSD.org>

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


# 9b827155 21-Oct-1998 David Greenman <dg@FreeBSD.org>

Decrement the now unused page table page's wire_count prior to freeing it.
It will soon be required that pages have a zero wire_count when being
freed.


# 3bfe64f9 06-Sep-1998 Tor Egge <tegge@FreeBSD.org>

Don't go below the low water mark of free pages due to optional prefaulting
of pages.
PR: 2431


# 569d43a2 04-Sep-1998 Andrey A. Chernov <ache@FreeBSD.org>

PAGE_WAKEUP -> vm_page_wakeup


# 1fcee469 23-Aug-1998 Bruce Evans <bde@FreeBSD.org>

Fixed printf format errors.


# 69ed480f 15-Aug-1998 Bruce Evans <bde@FreeBSD.org>

pmap.c:
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.

mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.

vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.


# 767dfb80 11-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Fixed printf format errors.


# 3d1af38b 11-Jul-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Don't disable pmap_setdevram() which isn't called, but which could be,
but instead disable pmap_setvidram() which is called, but probably
shouldn't be.

PR: 7227, 7240


# ac1e407b 11-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Fixed printf format errors.


# cf2819cc 21-May-1998 John Dyson <dyson@FreeBSD.org>

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


# 58067a99 19-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Make the size of the msgbuf (dmesg) a "normal" option.


# 12d311d0 18-May-1998 Tor Egge <tegge@FreeBSD.org>

Back out part of revision 1.198 commit (clearing kernel stack pages).
By request from David Greenman <dg@root.com>


# 5931a9c2 17-May-1998 Tor Egge <tegge@FreeBSD.org>

For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.


# cf4b29e4 17-May-1998 Tor Egge <tegge@FreeBSD.org>

Clear kernel stack pages before usage.
Correct panic message in pmap_zero_page (s/CMAP /CMAP2 /).


# 424edf1b 15-May-1998 John Dyson <dyson@FreeBSD.org>

Disable the auto-Write Combining setup for the pmap code. This
worked on a couple of machines of mine, but appears to cause problems
on others.


# fcf1880f 11-May-1998 John Dyson <dyson@FreeBSD.org>

Change some tests from CPU_CLASS686 to CPU_686 as appropriate, and
also correct a serious ommision that would cause process faulures
due to forgetting an invltlb type operatino. This was just a
transcription problem.


# 5498a452 10-May-1998 John Dyson <dyson@FreeBSD.org>

Support better performance with P6 architectures and in SMP
mode. Unnecessary TLB flushes removed. More efficient
page zeroing on P6 (modify page only if non-zero.)


# f0175db1 10-May-1998 John Dyson <dyson@FreeBSD.org>

Attempt to set write combining mode for graphics devices.


# 78a81826 19-Apr-1998 Bruce Evans <bde@FreeBSD.org>

Support compiling with gcc -pedantic (don't use a bogus, null cast).


# c1087c13 15-Apr-1998 Bruce Evans <bde@FreeBSD.org>

Support compiling with `gcc -ansi'.


# 55caa497 06-Apr-1998 Peter Wemm <peter@FreeBSD.org>

Bogus casts


# bef608bd 15-Mar-1998 John Dyson <dyson@FreeBSD.org>

Some VM improvements, including elimination of alot of Sig-11
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)

pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)

vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.

vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.

vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)

vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)

10) Modify FFS to use vtruncbuf.

vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)

vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.

vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)

vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.


# 005092bb 09-Mar-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn "PMAP_SHPGPERPROC" into a new-style option, add it to LINT, and
document it there.


# 8f9110f6 07-Mar-1998 John Dyson <dyson@FreeBSD.org>

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


# ffc82b0a 28-Feb-1998 John Dyson <dyson@FreeBSD.org>

1) Use a more consistent page wait methodology.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."


# 045b6fef 12-Feb-1998 Bruce Evans <bde@FreeBSD.org>

Fixed initialization of the 4MB page. Kernels larger than about 2.75MB
(from _btext to _end) crashed in pmap_bootstrap(). Smaller kernels
worked accidentally.


# b2651781 10-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Fix warning after previous staticization.


# 303b270b 08-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Staticize.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 95461b45 04-Feb-1998 John Dyson <dyson@FreeBSD.org>

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 44429dc4 03-Feb-1998 Bruce Evans <bde@FreeBSD.org>

Converted DISABLE_PSE to a new-style option.

Fixed some formatting in options.i386.


# eaf13dd7 31-Jan-1998 John Dyson <dyson@FreeBSD.org>

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


# 2d8acc0f 22-Jan-1998 John Dyson <dyson@FreeBSD.org>

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 47221757 17-Jan-1998 John Dyson <dyson@FreeBSD.org>

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


# 841fc368 22-Dec-1997 John Dyson <dyson@FreeBSD.org>

Correct my previous fix for the UPAGES problem.


# adbf9b6f 21-Dec-1997 John Dyson <dyson@FreeBSD.org>

Hopefully fix the problem with the TLB not being updated correctly.
Problem tracked down by bde@freebsd.org, but this is an attempted
efficient fix.


# 82566551 13-Dec-1997 John Dyson <dyson@FreeBSD.org>

After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.


# 96a73b40 20-Nov-1997 Bruce Evans <bde@FreeBSD.org>

Moved some extern declarations to header files (unused ones to /dev/null).


# 31e52254 07-Nov-1997 Tor Egge <tegge@FreeBSD.org>

Use UPAGES when setting up private pages for SMP (which includes idle stack).


# 0abc78a6 07-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Rename some local variables to avoid shadowing other local variables.

Found by: -Wshadow


# 4a11ca4e 07-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 55b211e3 28-Oct-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# d80130d3 26-Oct-1997 John Dyson <dyson@FreeBSD.org>

Check to see if the pv_limits are initialized before checking.


# fe3e6985 25-Oct-1997 John Dyson <dyson@FreeBSD.org>

Change the initial amount of memory allocated for pv_entries to be proportional
to the amount of system memory. Also, clean-up some of the new pv_entry
mgmt code.


# 0c8029e9 24-Oct-1997 John Dyson <dyson@FreeBSD.org>

Somehow an error crept in during the previous commit.


# 5985940e 24-Oct-1997 John Dyson <dyson@FreeBSD.org>

Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added. This helps large systems with lots of processes.


# 0a80f406 24-Oct-1997 John Dyson <dyson@FreeBSD.org>

Decrease the initial allocation for the zone allocations.


# 55166637 11-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


# a65247e1 20-Sep-1997 John Dyson <dyson@FreeBSD.org>

Add support for more than 1 page of idle process stack on SMP systems.


# 5b05023a 06-Sep-1997 John Dyson <dyson@FreeBSD.org>

Fix an intermittent problem during SMP code operation. Not all of the
idle page table directories for all of the processors was being updated
during kernel grow operations. The problem appears to be gone now.


# 9a3b3e8b 26-Aug-1997 Peter Wemm <peter@FreeBSD.org>

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


# 10a1aa05 25-Aug-1997 Bruce Evans <bde@FreeBSD.org>

Finished (?) support for DISABLE_PSE option. 2-3MB of kernel vm was sometimes
wasted.

Fixed type mismatches for functions with vm_prot_t's as args. vm_prot_t
is u_char, so the prototypes should have used promoteof(u_char) to match
the old-style function definitions. They use just vm_prot_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change vm_prot_t to u_int, to optimize for time
instead of space.

Removed a stale comment.


# d3d1eb99 06-Aug-1997 John Dyson <dyson@FreeBSD.org>

Fix the DDB breakpoint code when using the 4MB page support.


# f1c1c5b5 06-Aug-1997 John Dyson <dyson@FreeBSD.org>

More vm_zone cleanup. The sysctl now accounts for items better, and
counts the number of allocations.


# 0d65e566 05-Aug-1997 John Dyson <dyson@FreeBSD.org>

Another attempt at cleaning up the new memory allocator.


# b79933eb 05-Aug-1997 John Dyson <dyson@FreeBSD.org>

Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.


# b25b051b 04-Aug-1997 John Dyson <dyson@FreeBSD.org>

Modify pmap to use our new memory allocator.


# f6363c84 04-Aug-1997 John Dyson <dyson@FreeBSD.org>

Slightly reorder some operations so that the main processor gets global
mappings early on.


# de5858ab 04-Aug-1997 John Dyson <dyson@FreeBSD.org>

Remove the PMAP_PVLIST conditionals in pmap.*, and another unneeded define.


# 322d7a88 20-Jul-1997 John Dyson <dyson@FreeBSD.org>

Fix a crash that has manifest itself while running X after the 4MB
page upgrades.


# e31521c3 20-Jul-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# 78342719 17-Jul-1997 John Dyson <dyson@FreeBSD.org>

Hopefully fix a few problems that could cause hangs in SMP mode.
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.

We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...


# 0a0a85b3 16-Jul-1997 John Dyson <dyson@FreeBSD.org>

Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.


# a4ec81c7 25-Jun-1997 Tor Egge <tegge@FreeBSD.org>

Allow kernel configuration file to override PMAP_SHPGPERPROC. The default
value (200) is too low in some environments, causing a fatal
"panic: get_pv_entry: cannot get a pv_entry_t". The same panic might
still occur due to temporary shortage of free physical memory
(cf. PR i386/2431).


# b3196e4b 22-Jun-1997 Peter Wemm <peter@FreeBSD.org>

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


# a8baaafd 28-May-1997 Steve Passe <fsmp@FreeBSD.org>

Code such as apic_base[APIC_ID] converted to lapic__id

Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


# 288e2230 28-May-1997 Peter Wemm <peter@FreeBSD.org>

remove no longer needed opt_smp.h includes


# a8a74574 26-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Whoops.. We forgot to turn off the 4MB Virtual==Physical mapping at address
zero from bootstrap in the non-SMP case.

Noticed by: bde


# 477a642c 26-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# aec17d50 12-Apr-1997 John Dyson <dyson@FreeBSD.org>

The pmap code was too generous in the allocation of kva space for
the pv entries. This problem has become obvious due to the increase
in the size of the pv entries. We need to create a more intelligent
policy for pv entry management eventually.
Submitted by: David Greenman <dg@freebsd.org>


# 5856e12e 12-Apr-1997 John Dyson <dyson@FreeBSD.org>

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


# a2a1c95c 07-Apr-1997 Peter Wemm <peter@FreeBSD.org>

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 3def4913 30-Jan-1997 David Greenman <dg@FreeBSD.org>

Removed unnecessary PG_N flag from device memory mappings. This is handled
by the CPU/chipset already and was apparantly triggering a hardware bug that
causes strange parity errors.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# b447ce90 11-Jan-1997 John Dyson <dyson@FreeBSD.org>

When we changed pmap_protect to support adding the writeable
attribute to a page range, we forgot to set the PG_WRITEABLE
flag in the vm_page_t. This fixes that problem.


# 9b5a5d81 11-Jan-1997 John Dyson <dyson@FreeBSD.org>

Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.) Upper level recoded to use
pmap_ts_referenced.


# f486bc1d 28-Dec-1996 John Dyson <dyson@FreeBSD.org>

Allow pmap_protect to increase permissions. This mod can eliminate
the need for unnecessary vm_faults.
Submitted by: Alan Cox <alc@cs.rice.edu>


# d22671dc 10-Nov-1996 John Dyson <dyson@FreeBSD.org>

Support the PG_G flag on Pentium-Pro processors. This pretty
much eliminates the unnecessary unmapping of the kernel during
context switches and during invtlb...


# 1d6ccf9c 07-Nov-1996 Joerg Wunsch <joerg@FreeBSD.org>

Fix the message buffer mapping. This actually allows to increase
the message buffer size in <sys/msgbuf.h>.

Reviewed by: davidg,joerg
Submitted by: bde


# 5c2a644a 02-Nov-1996 John Dyson <dyson@FreeBSD.org>

Fix a problem with running down processes that have left wired
mappings with mlock. This problem only occurred because of the
quick unmap code not respecting the wired-ness of pages in the
process. In the future, we need to eliminate the dependency
intrinsic to the design of the code that wired pages actually
be mapped. It is kind-of bogus not to have wired pages mapped,
but it is also a weakness for the code to fall flat because
of a missing page.

This show fix a problem that Tor Egge has been having, and also
should be included into 2.2-RELEASE.


# 845c4ec4 22-Oct-1996 John Dyson <dyson@FreeBSD.org>

Account for the UPAGES in the same way as before moving the MD code
from vm_glue into pmap.c. Now RSS should appear to be the same as before.


# 675878e7 14-Oct-1996 John Dyson <dyson@FreeBSD.org>

Move much of the machine dependent code from vm_glue.c into
pmap.c. Along with the improved organization, small proc fork
performance is now about 5%-10% faster.


# 8f3a9a1b 12-Oct-1996 John Dyson <dyson@FreeBSD.org>

Minor optimization for final rundown of a pmap.


# 9d3fbbb5 12-Oct-1996 John Dyson <dyson@FreeBSD.org>

Performance optimizations. One of which was meant to go in before the
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.


# da2186af 12-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Cleaned up:
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.


# c20b324b 09-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Put I*86_CPU defines in opt_cpu.h.


# 27e9b35e 28-Sep-1996 John Dyson <dyson@FreeBSD.org>

Essentially rename pmap_update to be invltlb. It is a very machine
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.

Additionally, fix the tlb management code for 386 machines.


# f53687f7 28-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Restored my change in rev.1.119 which was clobbered by the previous commit.


# 9299d3c4 27-Sep-1996 John Dyson <dyson@FreeBSD.org>

Move pmap_update_1pg to cpufunc.h. Additionally,
use the invlpg opcode instead of the nasty looking .byte directives.
There are some other minor micro-level code improvements to pmap.c


# c254b6e8 13-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Made debugging code (pmap_pvdump()) compile again so that I can test LINT.
I don't know if it actually works.


# dba940b4 11-Sep-1996 John Dyson <dyson@FreeBSD.org>

Primarily a fix so that pages are properly tracked for being
modified. Pages that are removed by the pageout daemon were
the worst affected. Additionally, numerous minor cleanups,
including better handling of busy page table pages. This
commit fixes the worst of the pmap problems recently introduced.


# 690db31d 10-Sep-1996 John Dyson <dyson@FreeBSD.org>

A minor fix to the new pmap code. This might not fix the global problems
with the last major pmap commits.


# 5070c7f8 08-Sep-1996 John Dyson <dyson@FreeBSD.org>

Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)


# b8e251a5 08-Sep-1996 John Dyson <dyson@FreeBSD.org>

Improve the scalability of certain pmap operations.


# 67bf6868 29-Jul-1996 John Dyson <dyson@FreeBSD.org>

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


# 78d43461 29-Jul-1996 John Dyson <dyson@FreeBSD.org>

Fix a problem with a DEBUG section of code.


# b7fb3572 28-Jul-1996 John Dyson <dyson@FreeBSD.org>

Fix an error in statement order in pmap_remove_pages, remove the pmap
pte hint (for now), and general code cleanup.


# da54aa7f 28-Jul-1996 John Dyson <dyson@FreeBSD.org>

Fix a problem that pmap update was not being done for kernel_pmap. Also
remove some (currently) gratuitious tests for PG_V... This bug could
have caused various anomolous (temporary) behavior.


# 4f4d35ed 26-Jul-1996 John Dyson <dyson@FreeBSD.org>

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


# 73571d2d 12-Jul-1996 Bruce Evans <bde@FreeBSD.org>

Removed "optimization" using gcc's builtin memcpy instead of bcopy.
There is little difference now since the amount copied is large,
and bcopy will become much faster on some machines.


# f4346724 25-Jun-1996 John Dyson <dyson@FreeBSD.org>

When page table pages were removed from process address space, the
resident page stats were not being decremented. This mode corrects
that problem.


# cb87c9be 24-Jun-1996 John Dyson <dyson@FreeBSD.org>

Limit the scan for preloading pte's to the end of an object.


# 9b2b0822 17-Jun-1996 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes of <i386/isa/icu.h> and <i386/isa/icu.h>. icu.h
is only used by the icu support modules and by a few drivers that know
too much about the icu (most only use it to convert `n' to `IRQn'). isa.h
is only used by ioconf.c and by a few drivers that know too much about
isa addresses (a few have to, because config is deficient).


# ef743ce6 16-Jun-1996 John Dyson <dyson@FreeBSD.org>

Several bugfixes/improvements:
1) Make it much less likely to miss a wakeup in vm_page_free_wakeup
2) Create a new entry point into pmap: pmap_ts_referenced, eliminates
the need to scan the pv lists twice in many cases. Perhaps there
is alot more to do here to work on minimizing pv list manipulation
3) Minor improvements to vm_pageout including the use of pmap_ts_ref.
4) Major changes and code improvement to pmap. This code has had
several serious bugs in page table page manipulation. In order
to simplify the problem, and hopefully solve it for once and all,
page table pages are no longer "managed" with the pv list stuff.
Page table pages are only (mapped and held/wired) or
(free and unused) now. Page table pages are never inactive,
active or cached. These changes have probably fixed the
hold count problems, but if they haven't, then the code is
simpler anyway for future bugfixing.
5) The pmap code has been sorely in need of re-organization, and I
have taken a first (of probably many) steps. Please tell me
if you have any ideas.


# 419702a4 12-Jun-1996 John Dyson <dyson@FreeBSD.org>

Fix a very significant cnt.v_wire_count leak in vm_page.c, and some
minor leaks in pmap.c. Bruce Evans made me aware of this problem.


# c23670e2 11-Jun-1996 Gary Palmer <gpalmer@FreeBSD.org>

Clean up -Wunused warnings.

Reviewed by: bde


# 886d3e11 08-Jun-1996 John Dyson <dyson@FreeBSD.org>

Adjust the threshold for blocking on movement of pages from the cache
queue in vm_fault.

Move the PG_BUSY in vm_fault to the correct place.

Remove redundant/unnecessary code in pmap.c.

Properly block on rundown of page table pages, if they are busy.

I think that the VM system is in pretty good shape now, and the following
individuals (among others, in no particular order) have helped with this
recent bunch of bugs, thanks! If I left anyone out, I apologize!

Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard,
Marc Fournier.


# 475dca82 06-Jun-1996 John Dyson <dyson@FreeBSD.org>

Fix a bug in the pmap_object_init_pt routine that pages aren't taken
from the cache queue before being mapped into the process.


# 3ccd871c 05-Jun-1996 John Dyson <dyson@FreeBSD.org>

I missed a case of the page table page dirty-bit fix.


# 6b6f0008 04-Jun-1996 John Dyson <dyson@FreeBSD.org>

Keep page-table pages from ever being sensed as dirty. This should fix
some problems with the page-table page management code, since it can't
deal with the notion of page-table pages being paged out or in transit.
Also, clean up some stylistic issues per some suggestions from
Stephen McKay.


# 3943b4ea 02-Jun-1996 John Dyson <dyson@FreeBSD.org>

Don't carry the modified or referenced bits through to the child
process during pmap_copy. This minimizes unnecessary swapping or creation of
swap space. If there is a hold_count flaw for page-table
pages, clear the page before freeing it to lessen the chance of a system
crash -- this is a robustness thing only, NOT a fix.


# c2b39c99 01-Jun-1996 John Dyson <dyson@FreeBSD.org>

Fix the problem with pmap_copy that breaks X in small memory machines. Also
close some windows that are opened up by page table allocations. The
prefaulting code no longer uses hold counts, but now uses the busy
flag for synchronization.


# f35329ac 30-May-1996 John Dyson <dyson@FreeBSD.org>

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


# 25695129 28-May-1996 John Dyson <dyson@FreeBSD.org>

The wrong address (pindex) was being used for the page table directory. No
negative side effects right now, but just a clean-up.


# cd61f6c5 22-May-1996 Peter Wemm <peter@FreeBSD.org>

Fix harmless warning.. pmap_nw_modified was not having it's arg
cast to pt_entry_t like the others inside the DIAGNOSTIC code.


# 93d52b3c 21-May-1996 John Dyson <dyson@FreeBSD.org>

A serious error in pmap.c(pmap_remove) is corrected by this. When
comparing the PTD pointers, they needed to be masked by PG_FRAME, and
they weren't. Also, the "improved" non-386 code wasn't really an
improvement, so I simplified and fixed the code. This might have
caused some of the panics caused by the VM megacommit.


# ed48f831 20-May-1996 John Dyson <dyson@FreeBSD.org>

To quote Stephen McKay: pmap_copy is a complex NOP at this moment :-).

With this fix from Stephen, we are getting the target fork performance
that I have been trying to attain: P5-166, before the mega-commit: 700-800usecs,
after: 600usecs, with Stephen's fix: 500usecs!!! Also, this could be the
solution of some strange panic problems...
Reviewed by: dyson@freebsd.org
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


# 867a482d 19-May-1996 John Dyson <dyson@FreeBSD.org>

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


# b18bfc3d 17-May-1996 John Dyson <dyson@FreeBSD.org>

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


# aa8de40a 03-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


# 5084d10d 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Move atdevbase out of locore.s and into machdep.c
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE

This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.

(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)


# e911eafc 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 7847cec1 21-Apr-1996 John Dyson <dyson@FreeBSD.org>

This fixes a troubling oversight in some of the pmap code enhancements.
One of the manifiestations of the problem includes the -4 RSS problem
in ps.

Reviewed by: dyson
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


# 07b10591 06-Apr-1996 John Dyson <dyson@FreeBSD.org>

Major cleanups for the pmap code.


# 909e5e0e 31-Mar-1996 David Greenman <dg@FreeBSD.org>

Change if/goto into a while loop.


# 4e489ec4 27-Mar-1996 John Dyson <dyson@FreeBSD.org>

Remove a now unnecessary prototype from pmap.c. Also remove now
unnecessary vm_fault's of page table pages in trap.c.


# 208bfdc9 27-Mar-1996 John Dyson <dyson@FreeBSD.org>

Significant code cleanup, and some performance improvement. Also,
mlock will now work properly without killing the system.


# 3ce8e60f 12-Mar-1996 John Dyson <dyson@FreeBSD.org>

Make sure that we pmap_update AFTER modifying the page table entries.
The P6 can do a serious job of reordering code, and our stuff could
execute incorrectly.


# 6ad13830 10-Mar-1996 Jeffrey Hsu <hsu@FreeBSD.org>

For Lite2: proc LIST changes.
Reviewed by: david & bde


# 874308f7 10-Mar-1996 John Dyson <dyson@FreeBSD.org>

Improved efficiency in pmap_remove, and also remove some of the pmap_update
optimizations that were probably incorrect.


# 9212ebc6 09-Mar-1996 John Dyson <dyson@FreeBSD.org>

Correct some new and older lurking bugs. Hold count wasn't being
handled correctly. Fix some incorrect code that was included
to improve performance. Significantly simplify the pmap_use_pt and
pmap_unuse_pt subroutines. Add some more diagnostic code.


# d6673cba 24-Feb-1996 John Dyson <dyson@FreeBSD.org>

Re-insert a missing pmap_remove operation.


# 3eb77c83 24-Feb-1996 John Dyson <dyson@FreeBSD.org>

Fix a problem with tracking the modified bit. Eliminate the
ugly inline-asm code, and speed up the page-table-page tracking.


# 267173e7 04-Feb-1996 David Greenman <dg@FreeBSD.org>

Rewrote cpu_fork so that it doesn't use pmap_activate, and removed
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.


# 1a46737f 19-Jan-1996 Peter Wemm <peter@FreeBSD.org>

Some trivial fixes to get it to compile again, plus some new lint:
- cpuclass should be cpu_class
- CPUCLASS_I386 should be CPUCLASS_386
(^^ those only show up if you compile for i386)
- two missing prototypes on new functions
- one missing static


# bd7e5f99 18-Jan-1996 John Dyson <dyson@FreeBSD.org>

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# e3900906 22-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Staticized code that was hidden by `#ifdef DEBUG'.


# f2c6b65b 17-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


# c3741af9 14-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Added a prototype. Merged prototype lists.


# a316d390 10-Dec-1995 John Dyson <dyson@FreeBSD.org>

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# 87b91157 10-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize and cleanup.
remove a TON of #includes from machdep.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# 8ae2aed2 03-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Completed function declarations and/or added prototypes.


# 07658526 19-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused vars.


# 63017f04 22-Oct-1995 David Greenman <dg@FreeBSD.org>

Remove PG_W bit setting in some cases where it should not be set.

Submitted by: John Dyson <dyson>


# b596ee8d 22-Oct-1995 David Greenman <dg@FreeBSD.org>

More improvements to the logic for modify-bit checking. Removed
pmap_prefault() code as we don't plan to use it at this point in time.

Submitted by: John Dyson <dyson>


# 6928ec33 21-Oct-1995 David Greenman <dg@FreeBSD.org>

Simplified some expressions.


# 0937c08c 15-Sep-1995 David Greenman <dg@FreeBSD.org>

1) Killed 'BSDVM_COMPAT'.
2) Killed i386pagesperpage as it is not used by anything.
3) Fixed benign miscalculations in pmap_bootstrap().
4) Moved allocation of ISA DMA memory to machdep.c.
5) Removed bogus vm_map_find()'s in pmap_init() - the entire range was
already allocated kmem_init().
6) Added some comments.

virual_avail is still miscalculated NKPT*NBPG too large, but in order to
fix this properly requires moving the variable initialization into locore.s.
Some other day.


# 6b837e5d 28-Jul-1995 David Greenman <dg@FreeBSD.org>

Fixed bug I introduced with the memory-size code rewrite that broke
floppy DMA buffers...use avail_start not "first". Removed duplicate
(and wrong) declaration of phys_avail[].

Submitted by: Bruce Evans, but fixed differently by me.


# 24a1cce3 13-Jul-1995 David Greenman <dg@FreeBSD.org>

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# b2b795f0 11-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Fix -Wformat warnings from LINT kernel.


# fcb5be87 08-Apr-1995 David Greenman <dg@FreeBSD.org>

Cosmetic changes.


# 7b0047e2 30-Mar-1995 David Greenman <dg@FreeBSD.org>

Made pmap_testbit a static function.


# 880d1d84 26-Mar-1995 David Greenman <dg@FreeBSD.org>

Changed pmap_changebit() into a static function as it always should
have been.

Submitted by: John Dyson


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 90c47808 10-Mar-1995 David Greenman <dg@FreeBSD.org>

Removed unnecessary routines vm_get_pmap() and vm_put_pmap().
kmem_alloc() returns zero filled memory, so no need to explicitly
bzero() it.


# fde2cdc4 01-Mar-1995 David Greenman <dg@FreeBSD.org>

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


# 550f8550 25-Feb-1995 Bruce Evans <bde@FreeBSD.org>

Replace all remaining instances of `i386/include' by `machine' and fix
nearby #include inconsistencies.


# 7389231d 14-Feb-1995 David Greenman <dg@FreeBSD.org>

Killed the pmap_use_pt and pmap_unuse_pt prototypes as they are now in
machine/pmap.h.


# 87bc4e69 02-Feb-1995 David Greenman <dg@FreeBSD.org>

Mostly cosmetic changes. Use KERNBASE instead of UPT_MAX_ADDRESS in
some comparisons as it is more correct (we want the kernel page tables
included).
Reorganized some of the expressions for efficiency.
Fixed the new pmap_prefault() routine - it would sometimes pick up the
wrong page if the page in the shadow was present but the page in object
was paged out. The routine remains unused and commented out, however.
Explicitly free zero reference count page tables (rather than waiting
for the pagedaemon to do it).

Submitted by: John Dyson


# 7a642ab5 26-Jan-1995 David Greenman <dg@FreeBSD.org>

Fix from Doug Rabson for a panic related to not initializing the kernel's
PTD.

Submitted by: John Dyson


# d3a10e2c 25-Jan-1995 David Greenman <dg@FreeBSD.org>

Comment out pmap_prefault for the time being (perhaps until after 2.1).
The object_init_pt routine is still enabled and used, however, and this
is where most of the 'pre-faulting' performance improvement comes from.


# 3edf89fe 25-Jan-1995 David Greenman <dg@FreeBSD.org>

Make sure that the pages being 'pre-faulted' are currently on a queue.


# 6100ed1d 25-Jan-1995 David Greenman <dg@FreeBSD.org>

Be a bit less fast and loose about setting non-cacheablity of pages.


# fbdfe8ac 24-Jan-1995 David Greenman <dg@FreeBSD.org>

Changed buffer allocation policy (machdep.c)
Moved various pmap 'bit' test/set functions back into real functions; gcc
generates better code at the expense of more of it. (pmap.c)
Fixed a deadlock problem with pv entry allocations (pmap.c)
Added a new, optional function 'pmap_prefault' that does clustered page
table preloading (pmap.c)
Changed the way that page tables are held onto (trap.c).

Submitted by: John Dyson


# 7082ec8a 15-Jan-1995 David Greenman <dg@FreeBSD.org>

Fixed some page table reference count problems; these changes may not be
complete, but should be closer to correct than before.


# 92c38536 13-Jan-1995 David Greenman <dg@FreeBSD.org>

Add missing object_lock/unlock.


# 0d94caff 09-Jan-1995 David Greenman <dg@FreeBSD.org>

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 94101b52 18-Dec-1994 David Greenman <dg@FreeBSD.org>

Move page_unhold's in pmap_object_init_pt down one line to gard against
a potential race condition.


# 931bde7f 17-Dec-1994 David Greenman <dg@FreeBSD.org>

Check for PG_FAKE too in pmap_object_init_pt.


# 3fb3086e 08-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

db_disasm.c: Unused var zapped.
pmap.c: tons of unused vars zapped, various other warnings silenced.
trap.c: unused vars zapped.
vm_machdep.c: A wrong argument, which by chance did the right thing, was
corrected.


# df9ab304 16-Sep-1994 David Greenman <dg@FreeBSD.org>

Removed inclusion of pio.h and cpufunc.h (cpufunc.h is included from
systm.h). Merged functionality of pio.h into cpufunc.h. Cleaned up some
related code.


# 9cbeeedd 03-Sep-1994 David Greenman <dg@FreeBSD.org>

Added pmap_mapdev() function to map device memory.


# 2c7a40c7 01-Sep-1994 David Greenman <dg@FreeBSD.org>

Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update().
Made pmap_update an inline assembly function.


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# f540b106 12-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Change all #includes to follow the current Berkeley style. Some of these
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.

This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:

rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make


# 8339815f 07-Aug-1994 David Greenman <dg@FreeBSD.org>

Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that
are no longer needed because of this.


# a481f200 07-Aug-1994 David Greenman <dg@FreeBSD.org>

Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by: John Dyson


# c87801fe 06-Aug-1994 David Greenman <dg@FreeBSD.org>

Fixed various prototype problems with the pmap functions and the subsequent
problems that fixing them caused.


# 16f62314 06-Aug-1994 David Greenman <dg@FreeBSD.org>

Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by: John Dyson


# d23d07ef 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Merged in post-1.1.5 work done by myself and John Dyson. This includes:

me:
1) TLB flush optimization that effectively eliminates half of all of the
TLB flushes. This works by only flushing the TLB when a page is "present"
in memory (i.e. the valid bit is set in the page table entry). See section
5.3.5 of the Intel 386 Programmer's Reference Manual.
2) The handling of "CMAP" has been improved to catch attempts at multiple
simultaneous use.

John:
1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into
the kernel. This is for future optimizations and support for the upcoming
merged VM/buffer cache.

Reviewed by: John Dyson


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# bb508919 01-May-1994 David Greenman <dg@FreeBSD.org>

Removed some tlbflush optimizations as some of them were bogus and lead
to some strange behavior.


# 0e195446 20-Apr-1994 David Greenman <dg@FreeBSD.org>

Bug fixes and performance improvements from John Dyson and myself:

1) check va before clearing the page clean flag. Not doing so was
causing the vnode pager error 5 messages when paging from
NFS. (pmap.c)
2) put back interrupt protection in idle_loop. Bruce didn't think
it was necessary, John insists that it is (and I agree). (swtch.s)
3) various improvements to the clustering code (vm_machdep.c). It's
now enabled/used by default.
4) bad disk blocks are now handled properly when doing clustered IOs.
(wd.c, vm_machdep.c)
5) bogus bad block handling fixed in wd.c.
6) algorithm improvements to the pageout/pagescan daemons. It's amazing
how well 4MB machines work now.


# f690bbac 14-Apr-1994 David Greenman <dg@FreeBSD.org>

Changes from John Dyson and myself:

1) Removed all instances of disable_intr()/enable_intr() and changed
them back to splimp/splx. The previous method was done to improve
the performance, but Bruces recent changes to inline spl* have
made this unnecessary.
2) Cleaned up vm_machdep.c considerably. Probably fixed a few bugs, too.
3) Added a new mechanism for collecting page statistics - now done by
a new system process "pagescan". Previously this was done by the
pageout daemon, but this proved to be impractical.
4) Improved the page usage statistics gathering mechanism - performance is
much improved in small memory machines.
5) Modified mbuf.h to enable the support for an external free routine when
using mbuf clusters. Added appropriate glue in various places to
allow this to work.
6) Adapted a suggested change to the NFS code from Yuval Yurom to take
advantage of #5.
7) Added fault/swap statistics support.


# 6b4ac811 29-Mar-1994 David Greenman <dg@FreeBSD.org>

New routine "pmap_kenter", designed to take advantage of the special
case of the kernel pmap.


# 943a66f3 14-Mar-1994 David Greenman <dg@FreeBSD.org>

Performance improvements from John Dyson.

1) A new mechanism has been added to prevent pages from being paged
out called "vm_page_hold". Similar to vm_page_wire, but
much lower overhead.
2) Scheduling algorithm has been changed to improve interactive
performance.
3) Paging algorithm improved.
4) Some vnode and swap pager bugs fixed.


# 04f18356 07-Mar-1994 David Greenman <dg@FreeBSD.org>

1) "Pre-faulting" in of pages into process address space
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.

(process startup time using in-memory segments *much* faster)

2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.

(generally more efficient pte management)

3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)

(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)

4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.

(less likely to pageout a recently used page)

5) Slight improvement to the page table page trap efficiency.

(generally faster system VM fault performance)

6) Defer creation of unnamed anonymous regions pager until needed.

(speeds up shared memory bss creation)

7) Remove possible deadlock from swap_pager initialization.

8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.

9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.

John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com


# 2c194b2e 13-Feb-1994 David Greenman <dg@FreeBSD.org>

Fixed bug in handling of COW - the original code was bogus and it was
only accidental that it worked. Also, don't cache non-managed pages.


# 43ef94a9 09-Feb-1994 David Greenman <dg@FreeBSD.org>

Patch from John Dyson:

a pv chain was being traversed while interrupts were
fully enabled in pmap_remove_all ... this is bogus, and
has been fixed in pmap.c. (sorry for adding the splimp)


# 98446d4e 07-Feb-1994 David Greenman <dg@FreeBSD.org>

Fixes from John Dyson to fix out-of-memory hangs and other problems (such
as increased swap space usage) related to (incorrectly) paging out the
page tables.


# 8f64d25d 31-Jan-1994 David Greenman <dg@FreeBSD.org>

Added four pattern memory test routine that is done at startup.


# ec120393 30-Jan-1994 David Greenman <dg@FreeBSD.org>

VM system performance improvements from John Dyson and myself. The
following is a summary:

1) increased object cache back up to a more reasonable value.
2) removed old & bogus cruft from machdep.c (clearseg, copyseg,
physcopyseg, etc).
3) inlined many functions in pmap.c
4) changed "load_cr3(rcr3())" into tlbflush() and made tlbflush inline
assembly.
5) changed the way that modified pages are tracked - now vm_page struct
is kept updated directly - no more scanning page tables.
6) removed lots of unnecessary spl's
7) removed old unused functions from pmap.c
8) removed all use of page_size, page_shift, page_mask variables - replaced
with PAGE_ constants.
9) moved trunc/round_page, atop, ptoa, out of vm_param.h and into i386/
include/param.h, and optimized them.
10) numerous changes to sys/vm/ swap_pager, vnode_pager, pageout, fault
code to improve performance. LRU algorithm modified to be more
effective, read ahead/behind values tuned for better performance,
etc, etc...


# 5d7fe66e 26-Jan-1994 David Greenman <dg@FreeBSD.org>

Made pmap_is_managed a static inline function.


# d64f660f 17-Jan-1994 David Greenman <dg@FreeBSD.org>

Improvements mostly from John Dyson, with a little bit from me.

* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system


# 7f8cb368 14-Jan-1994 David Greenman <dg@FreeBSD.org>

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


# aaf08d94 18-Dec-1993 Garrett Wollman <wollman@FreeBSD.org>

Make everything compile with -Wtraditional. Make it easier to distribute
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.

NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.


# 6aa5e701 13-Dec-1993 David Greenman <dg@FreeBSD.org>

added some panics to catch the condition where pmap_pte returns null
- indicating that the page table page is non-resident.


# 381fe1aa 24-Nov-1993 Garrett Wollman <wollman@FreeBSD.org>

Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and
add same (sans -Werror) to Makefile for future compilations.


# 0967373e 12-Nov-1993 David Greenman <dg@FreeBSD.org>

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.


# 960173b9 15-Oct-1993 Rodney W. Grimes <rgrimes@FreeBSD.org>

genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.

locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.

Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!

Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.

Use if NNPX > 0 instead of ifdef NPX for floating point code.

machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.

It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.

pmap.h
Remove rcsid, don't want them in the kernel files!

Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.

Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.

trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)

vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Replace several 0xFE00000 constants with KERNBASE


# a27df782 12-Oct-1993 Rodney W. Grimes <rgrimes@FreeBSD.org>

KPTDI_LAST renamed to KPTDI


# 9aa17d68 12-Oct-1993 Rodney W. Grimes <rgrimes@FreeBSD.org>

Eliminate definition of I386_PAGE_SIZE and use NBPG instead
Replace 0xFE000000 constants with KERNBASE
Use new definition NKPDE in place of a first-last+1 calculation.


# b145f751 30-Sep-1993 Rodney W. Grimes <rgrimes@FreeBSD.org>

This is a fix for the 32K DMA buffer region that was not accounted for,
it relocates it to be after the BIOS memory hole instead of right below
the 640K limit.
THANK YOU CHRIS!!!

From: <cgd@postgres.Berkeley.EDU>
Date: Wed, 29 Sep 93 18:49:58 -0700
basically, reserve a new 32k space right after firstaddr,
and put the buffer space there...

the diffs are below, and are in ~cgd/sys/i386/i386 (in machdep.c)
on freefall. i obviously can't test them, so if some of you would
look the diffs over and try them out...


# b4f05987 08-Sep-1993 Rodney W. Grimes <rgrimes@FreeBSD.org>

Changed the pg("ptdi> %x") to a printf and then a panic, since we are
going to panic shortly after this anyway. Destroys less state, and
keeps the machine from waiting for someone to smash the return key
a few times before it panics!


# 26931201 27-Jul-1993 David Greenman <dg@FreeBSD.org>

* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading,
profiling, and various protection checks that cause security holes
and system crashes.
* Changed min/max/bcmp/ffs/strlen to be static inline functions
- included from cpufunc.h in via systm.h. This change
improves performance in many parts of the kernel - up to 5% in the
networking layer alone. Note that this requires systm.h to be included
in any file that uses these functions otherwise it won't be able to
find them during the load.
* Fixed incorrect call to splx() in if_is.c
* Fixed bogus variable assignment to splx() in if_ed.c


# 5b81b6b3 12-Jun-1993 Rodney W. Grimes <rgrimes@FreeBSD.org>

Initial import, 0.1 + pk 0.2.4-B1