History log of /freebsd-current/sys/powerpc/booke/pmap.c
Revision Date Author Comments
# b0056b31 03-Jun-2024 Doug Moore <dougm@FreeBSD.org>

libkern: add ilog2 macro

The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.

Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.

Reviewed by: alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision: https://reviews.freebsd.org/D45170
Differential Revision: https://reviews.freebsd.org/D45235


# deab5717 27-May-2024 Mitchell Horne <mhorne@FreeBSD.org>

Adjust comments referencing vm_mem_init()

I cannot find a time where the function was not named this.

Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383


# fcace5ab 04-Apr-2024 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Reserve KVA for minidump working area

This was already handled for the AIM64 pmaps, so add the same to
Book-E64 pmap.


# 1f1b2286 31-Jan-2024 John Baldwin <jhb@FreeBSD.org>

pmap: Convert boolean_t to bool.

Reviewed by: kib (older version)
Differential Revision: https://reviews.freebsd.org/D39921


# 02320f64 19-Oct-2023 Zhenlei Huang <zlei@FreeBSD.org>

pmap: Prefer consistent naming for loader tunable

The sysctl knob 'vm.pmap.pv_entry_max' becomes a loader tunable since
7ff48af7040f (Allow a specific setting for pv entries) but is fetched
from system environment 'vm.pmap.pv_entries'. That is inconsistent and
obscure.

This reverts 36e1b9702e21 (Correct the tunable name in the message).

PR: 231577
Reviewed by: jhibbits, alc, kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42274


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 61fab134 25-May-2023 John Baldwin <jhb@FreeBSD.org>

powerpc booke: Add an __unused wrapper for a variable only used under DEBUG.


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 9b02f2da 24-Apr-2023 John Baldwin <jhb@FreeBSD.org>

powerpc: Use valid prototypes for function declarations with no arguments.

Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D39733


# d1426018 23-Apr-2023 Dimitry Andric <dim@FreeBSD.org>

powerpc: fix a few pmap related functions to return correct types

While experimenting with changing boolean_t to another type, I noticed
that several powerpc pmap related functions returned the wrong type:
boolean_t instead of int.

Fix several declarations and definitions to match the actual pmap
function types: pmap_dev_direct_mapped_t and pmap_ts_referenced_t.

MFC after: 3 days


# 7ae99f80 22-Sep-2022 John Baldwin <jhb@FreeBSD.org>

pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.

This matches the return type of pmap_mapdev/bios.

Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548


# 8dc8feb5 26-Mar-2021 Jason A. Harmening <jah@FreeBSD.org>

Clean up a couple of MD warts in vm_fault_populate():

--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32. This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping. For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place. Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.

--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.

Reviewed by: kib, markj, bdragon
Differential Revision: https://reviews.freebsd.org/D29439


# 6f3b523c 14-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

Avoid dump_avail[] redefinition.

Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.

Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741


# b64b3133 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

powerpc: clean up empty lines in .c and .h files


# 4ae224c6 16-Jul-2020 Conrad Meyer <cem@FreeBSD.org>

Revert r240317 to prevent leaking pmap entries

Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.

An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.

Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689


# d3111144 05-Jun-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc: Use IFUNCs for copyin/copyout/etc

Summary:
Radix on AIM, and all of Book-E (currently), can do direct addressing of
user space, instead of needing to map user addresses into kernel space.
Take advantage of this to optimize the copy(9) functions for this
behavior, and avoid effectively NOP translations.

Test Plan: Tested on powerpcspe, powerpc64/booke, powerpc64/AIM

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D25129


# 45b69dd6 26-May-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/mmu: Convert PowerPC pmap drivers to ifunc from kobj

With IFUNC support in the kernel, we can finally get rid of our poor-man's
ifunc for pmap, utilizing kobj. Since moea64 uses a second tier kobj as
well, for its own private methods, this adds a second pmap install function
(pmap_mmu_init()) to perform pmap 'post-install pre-bootstrap'
initialization, before the IFUNCs get initialized.

Reviewed by: bdragon


# 65bbba25 10-May-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc64: Implement Radix MMU for POWER9 CPUs

Summary:
POWER9 supports two MMU formats: traditional hashed page tables, and Radix
page tables, similar to what's presesnt on most other architectures. The
PowerISA also specifies a process table -- a table of page table pointers--
which on the POWER9 is only available with the Radix MMU, so we can take
advantage of it with the Radix MMU driver.

Written by Matt Macy.

Differential Revision: https://reviews.freebsd.org/D19516


# 69e8f478 10-Apr-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use power-of-two mappings in 64-bit pmap_mapdev

Summary:
This reduces the precious TLB1 entry consumption (64 possible in
existing 64-bit cores), by adjusting the size and alignment of a device
mapping to a power of 2, to encompass the full mapping and its
surroundings.

One caveat with this: If a mapping really is smaller than a power of 2,
it's possible to get a machine check or hang if the 'missing' physical
space is accessed. In practice this should not be an issue for users,
as devices overwhelmingly have physical spaces on power-of-two sizes and
alignments, and any design that includes devices which don't follow this
can be addressed by undefining the POW2_MAPPINGS guard.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24248


# d7c0543f 10-Apr-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Add pte_find_next() to find the next in-use PTE

Summary:
Iterating over VM_MIN_ADDRESS->VM_MAXUSER_ADDRESS can take a very long
time iterating one page at a time (2**(log_2(SIZE)-12) operations),
yielding possibly several days or even weeks on 64-bit Book-E, even for
a largely empty, which can happen when swapping out a process by
vmdaemon. Speed this up by instead finding the next PTE at or equal to
the given VA.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24238


# dd8775a1 10-Apr-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Change Book-E 64-bit pmap to 4-level table

Summary:
The existing page table is fraught with errors, since it creates a hole
in the address space bits. Fix this by taking a cue from the POWER9
radix pmap, and make the page table 4 levels, 52 bits.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24220


# abc00e5f 30-Mar-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: Replace a logical TAILQ_FOREACH_SAFE with the real thing

No functional change, just cleanup.


# d926d578 09-Mar-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Split out 32- and 64- bit pmap details from main body

Summary:
This is largely a straight-forward cleave of the 32-bit and 64-bit page
table specifics, along with the mmu_booke_*() functions that are wholely
different between the two implementations.

The ultimate goal of this is to make it easier to reason about and
update a specific implementation without wading through the other
implementation details. This is in support of further changes to the 64-bit
pmap.

Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D23983


# d029e3b3 24-Feb-2020 Justin Hibbits <jhibbits@FreeBSD.org>

Unbreak the 32-bit powerpc builds

Force unsigned integer usage by casting to vm_offset_t, to avoid integer
overflow, from r358305


# 0b2f2528 24-Feb-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use a pseudo-DMAP for the device mappings on booke64

Since powerpc64 has such a large virtual address space, significantly larger
than its physical address space, take advantage of this, and create yet
another DMAP-like instance for the device mappings. In this case, the
device mapping "DMAP" is in the 0x8000000000000000 - 0xc000000000000000
range, so as not to overlap the physical memory DMAP.

This will allow us to add TLB1 entry coalescing in the future, especially
useful for things like the radeonkms driver, which maps parts of the GPU at
a time, but eventually maps all of it, using up a lot of TLB1 entries (~40).


# 5915b638 21-Feb-2020 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix handling of pvh_global_lock and pmap lock

ptbl_alloc() is expected to return with the pvh_global_lock and pmap
lock held. However, it will return with them unlocked if nosleep is
specified.

Along with this, fix lock ordering of pvh_global_lock with respect to
the pmap lock in other places.

Differential Revision: https://reviews.freebsd.org/D23692


# 432ff6ee 17-Jan-2020 Brandon Bergren <bdragon@FreeBSD.org>

[PowerPC] Fix Book-E direct map for >=16G ram on e5500

It turns out the maximum TLB1 page size on e5500 is 4G, despite the format
being defined for up to 1TB.

So, we need to clamp the DMAP TLB1 entries to not attempt to create 16G or
larger entries.

Fixes boot on my X5000 in which I just installed 16G of RAM.

Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23244


# caef3e12 06-Dec-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: NUMA-ize vm_page_array on powerpc

Summary:
This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.

On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.

Since page_range (vm_page_startup()) is no longer used on Book-E but is on
32-bit AIM, mark the variable as potentially unused, rather than using a
nasty #if defined() list.

Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21449


# ad73d2ab 03-Dec-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix some formatting errors in debug printfs

Use the right formats for the types given (vm_offset_t and vm_size_t are
both uint32_t on 32-bit platforms, and uint64_t on 64-bit platforms, and
match size_t in size, so we can use the size_t format as we do in other
similar code).

These were found by clang.


# 7194b0a3 18-Nov-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke pmap: Use the right 'tlbilx' form to invalidate TIDs

'tlbilxpid' is 'tlbilx 1, 0', while the existing form is 'tlbilx 0, 0',
which translates to 'tlbilxlpid', invalidating a LDPID. This effectively
invalidates the entire TLB, causing unnecessary reloads.


# b5d54294 05-Nov-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix pmap_mapdev_attr() for multi-TLB1 entry mappings

Also, fix pmap_change_attr() to ignore non-kernel mappings.

* Fix a masking bug in mmu_booke_mapdev_attr() which caused it to align
mappings to the smallest mapping alignment, instead of the largest. This
caused mappings to be potentially pessimally aligned, using more TLB
entries than necessary.
* Return existing mappings from mmu_booke_mapdev_attr() that span more than
one TLB1 entry. The drm-current-kmod drivers map discontiguous segments
of the GPU, resulting in more than one TLB entry being used to satisfy the
mapping.
* Ignore non-kernel mappings in mmu_booke_change_attr(). There's a bug in
the linuxkpi layer that causes it to actually try to change physical
address mappings, instead of virtual addresses. amd64 doesn't encounter
this because it ignores non-kernel mappings.

With this it's possible to use drm-current-kmod on Book-E.


# 730de0f7 03-Nov-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: Make use of tlb1_mapin_region in pmap_mapdev_attr()

tlb1_mapin_region() and pmap_mapdev_attr() do roughly the same thing -- map
a chunk of physical address space(memory or MMIO) into virtual, but do it in
differing ways. Unify the code, settling on pmap_mapdev_attr()'s algorithm,
to simplify and unify the logic. This fixes a bug with growing the kernel
mappings in mmu_booke_bootstrap(), where part of the mapping was not getting
done, leading to a hang when the unmapped VAs were accessed.


# ab3f2a38 02-Nov-2019 Brandon Bergren <bdragon@FreeBSD.org>

Add support for building Book-E kernels with clang/lld.

This involved several changes:

* Since lld does not like text relocations, replace SMP boot page text relocs
in booke/locore.S with position-independent math, and track the virtual base
in the SMP boot page header.

* As some SPRs are interpreted differently on clang due to the way it handles
platform-specific SPRs, switch m*dear and m*esr mnemonics out for regular
m*spr. Add both forms of SPR_DEAR to spr.h so the correct encoding is selected.

* Change some hardcoded 32 bit things in the boot page to be pointer-sized, and
fix alignment.

* Fix 64-bit build of booke/pmap.c when enabling pmap debugging.

Additionally, I took the opportunity to document how the SMP boot page works.

Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D21999


# 8b079fcc 31-Oct-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix TLB1 entry accounting

It's possible, with per-CPU mappings, for TLB1 indices to get out of sync.
This presents a problem when trying to insert an entry into TLB1 of all
CPUs. Currently that's done by assuming (hoping) that the TLBs are
perfectly synced, and inserting to the same index for all CPUs. However,
with aforementioned private mappings, this can result in overwriting
mappings on the other CPUs.

An example:

CPU0 CPU1
<setup all mappings> <idle>
3 private mappings
kick off CPU 1
initialize shared mappings (3 indices low)
Load kernel module, triggers 20 new mappings
Sync mappings at N-3
initialize 3 private mappings.

At this point, CPU 1 has all the correct mappings, while CPU 0 is missing 3
mappings that were shared across to CPU 1. When CPU 0 tries to access
memory in one of the overwritten mappings, it hangs while tripping through
the TLB miss handler. Device mappings are not stored in any page table.

This fixes by introducing a '-1' index for tlb1_write_entry_int(), so each
CPU searches for an available index private to itself.

MFC after: 3 weeks


# dc2b5bb4 22-Oct-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix Book-E boot post-minidump

r353489 added minidump support for powerpc64, but it added a dependency on
the dump_avail array. Leaving it uninitialized caused breakage in late
boot. Initialize dump_avail, even though the 64-bit booke pmap doesn't yet
support minidumps, but will in the future.


# 4ffdb9f2 19-Oct-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke pmap: Fix printf format type warnings


# 01cef4ca 16-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Remove page locking from pmap_mincore().

After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823


# 2a499f92 16-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix assert in PowerPC pmaps after introduction of object busy.

The VM_PAGE_OBJECT_BUSY_ASSERT() in pmap_enter() implementation should
be only asserted when the code is executed as result of pmap_enter(),
not when the same code is entered from e.g. pmap_enter_quick(). This
is relevant for all PowerPC pmap variants, because mmu_*_enter() is
used as the backend, and assert is located there.

Add a PowerPC private pmap_enter() PMAP_ENTER_QUICK_LOCKED flag to
indicate that the call is not from pmap_enter(). For non-quick-locked
calls, assert that the object is locked.

Reported and tested by: bdragon
Reviewed by: alc, bdragon, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22041


# 638f8678 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596


# 205be21d 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592


# ec17d5e0 13-Oct-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: Tighten condition for removing tracked pages in Book-E pmap

There are cases where there's no vm_page_t structure for a given physical
address, such as the CCSR. In this case, trying to obtain the
md.page_tracked struct member would lead to a NULL dereference, and panic.
Tighten this up by checking for kernel_pmap AND that the page structure
actually exists before dereferencing. The flag can only be set when it's
tracked in the kernel pmap anyway.

MFC after: 3 weeks


# b119329d 25-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Complete the removal of the "wire_count" field from struct vm_page.

Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.

Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768


# e8bcf696 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Revert r352406, which contained changes I didn't intend to commit.


# 41fd4b94 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Fix a couple of nits in r352110.

- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486


# 7038069e 27-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

Revert a part of r350883 that should never have gone in

The wire_count change is not part of the unification, and doesn't even make
sense.

Reported by: markj


# 8da08d51 25-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Clean up pmap a little for 64-bit

64-bit Book-E pmap doesn't need copy and zero bounce pages, nor the mutex.
Don't initialize them or reserve space for them.


# e7f2151c 25-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use the DMAP if possible in pmap_map()

This avoids unnecessary TLB usage for statically mapped regions, such as
vm_page_array.


# 6793e5b2 19-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc: Link Book-E kernels at the same address as AIM kernels

Summary:
Reduce the diff between AIM and Book-E even more. This also cleans up
vmparam.h significantly.

Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21301


# 21943937 15-Aug-2019 Jeff Roberson <jeff@FreeBSD.org>

Move phys_avail definition into MI code. It is consumed in the MI layer and
doing so adds more flexibility with less redundant code.

Reviewed by: jhb, markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21250


# 141a0ab0 14-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: Enable UMA_MD_SMALL_ALLOC for 64-bit booke

The only thing blocking UMA_MD_SMALL_ALLOC from working on 64-bit booke
powerpc was a missing check in pmap_kextract(). Adding DMAP handling into
pmap_kextract(), we can now use UMA_MD_SMALL_ALLOC. This should improve
performance and stability a bit, since DMAP is always mapped in TLB1, so
this relieves pressure on TLB0.

MFC after: 3 weeks


# 3be09f30 11-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc: Unify pmap definitions between AIM and Book-E

This is part 2 of r347078, pulling the page directory out of the Book-E
pmap. This breaks KBI for anything that uses struct pmap (such as vm_map)
so any modules that access this must be rebuilt.


# 937e8f20 07-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: Minor optimizations to 64-bit booke pmap

Don't recalculate the VM page of the page table pages, just pass them down
to free. Also, use the pmap's page zero function instead of bzero().


# 8b1531ec 05-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

Fix build from r350622

It helps if my local kernel build has INVARIANTS.


# dc825fed 05-Aug-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/pmap: Simplify Book-E 64-bit page table management

There is no need for the 64-bit pmap to have a fixed number of page table
buffers. Since the 64-bit pmap has a DMAP, we can effectively have user
page tables limited only by total RAM size.


# eeacb3b0 08-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended. __FreeBSD_version is bumped.

Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247


# 72e58595 22-May-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: It helps to set variables before using them

Actually set the source and destination VA's before using them. Fixes a
bizarre panic on 32-bit Book-E. Not sure why this wasn't caught by the
compiler.


# b4b2e7a7 21-May-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use wrtee instead of msr to restore EE bit

The MSR[EE] bit does not require synchronization when changing. This is a
trivial micro-optimization, removing the trailing isync from mtmsr().

MFC after: 1 week


# 2b03b6bd 08-May-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Rewrite pmap_sync_icache() a bit

* Make mmu_booke_sync_icache() use the DMAP on 64-bit prcoesses, no need to
map the page into the user's address space. This removes the
pvh_global_lock from the equation on 64-bit.
* Don't map the page with user-readability on 32-bit. I don't know what the
chance of a given user process being able to access the NULL page when
another process's page is added there, but it doesn't seem like a good
idea to map it to NULL with user read permissions.
* Only sync as much as we need to. There are only two significant places
where pmap_sync_icache is used: proc_rwmem(), and the SIGILL second-chance
for powerpc. The SIGILL second chance is likely the most common, and only
syncs 4 bytes, so avoid the other 127 loop iterations (4096 / 32 byte
cacheline) in __syncicache().


# 4023311a 08-May-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Do as much work outside of TLB locks as possible

Reduce the surface area of the TLB locks. Unfortunately the same trick for
serializing the tlbie instruction on OEA64 cannot be used here to reduce the
scope of the tlbivax mutex to the tlbsync only, as the mutex also serializes
the TLB miss lock as a side effect, so contention on this lock may not be
reducible any further.


# 2154a866 05-May-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use #ifdef __powerpc64__ instead of hw_direct_map in places

Since the DMAP is only available on powerpc64, and is *always* available on
Book-E powerpc64, don't penalize either side (32-bit or 64-bit) by always
checking hw_direct_map to perform operations. This saves 5-10% time on
various ports builds, and on buildworld+buildkernel on Book-E hardware.

MFC after: 3 weeks


# bfd07877 05-May-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix size check for phys_avail in pmap bootstrap

Use the nitems() macro instead of the expansion, a'la r298352. Also, fix the
location of this check to after initializing availmem_regions_sz, so that the
check isn't always against 0, thus always failing (nitems(phys_avail) is always
more than 0).


# a9b033c2 15-Feb-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix 32-bit build

MFC after: 2 weeks
MFC with: 344202


# 0454ed97 15-Feb-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: depessimize MAS register updates

We only need to isync before we actually use the MAS registers, so before and
after the TLB read/write/sync/search operations.

MFC after: 2 weeks


# 18f7e2b4 15-Feb-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use DMAP where possible for page copy and zeroing

This avoids several locks and pmap_kenter()'s, improving performance
marginally.

MFC after: 2 weeks


# 64143619 12-Feb-2019 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Use the 'tlbilx' instruction on newer cores

Newer cores have the 'tlbilx' instruction, which doesn't broadcast over
CoreNet. This is significantly faster than walking the TLB to invalidate
the PID mappings. tlbilx with the arguments given takes 131 clock cycles to
complete, as opposed to 512 iterations through the loop plus tlbre/tlbwe at
each iteration.

MFC after: 3 weeks


# f1e0cb5e 03-Dec-2018 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc: preload_addr_relocate is no longer necessary for booke

The same behavior was moved to machdep.c, paired with AIM's relocation,
making this redundant. With this, it's now possible to boot FreeBSD with
ubldr on a uboot Book-E platform, even with a
KERNBASE != VM_MIN_KERNEL_ADDRESS.


# b9964353 27-Nov-2018 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Fix debug printfs in pmap

Add missing '%'s so printf formats are actually handled.


# 6f4827ac 21-Oct-2018 Justin Hibbits <jhibbits@FreeBSD.org>

powerpc/booke: Turn tlb*_print_tlbentries() into 'show tlb*' DDB commands

debugf() is unnecessary for the TLB printing functions, as they're only
intended to be used from ddb. Instead, make them full DDB 'show'
commands, so now it can be written as 'show tlb1' and 'show tlb0'
instead of calling the function, hoping DEBUG has been defined.


# 340a810b 19-Aug-2018 Justin Hibbits <jhibbits@FreeBSD.org>

booke pmap: hide debug-ish printf behind bootverbose

It's not necessary during normal operation to know the mapped region size
and wasted space.


# 38cfc8c3 27-May-2018 Justin Hibbits <jhibbits@FreeBSD.org>

Print the full-width pointer values in hex.

PRI0ptrX is used to print a zero-padded hex value of the architecture's bitness,
so on 64-bit architectures it'll print the full 64 bit address.


# 9225dfbd 03-Apr-2018 Justin Hibbits <jhibbits@FreeBSD.org>

Correct the ilog2() for calculating memory sizes.

TLB1 can handle ranges up to 4GB (through e5500, larger in e6500), but
ilog2() took a unsigned int, which maxes out at 4GB-1, but truncates
silently. Increase the input range to the largest supported, at least for
64-bit targets. This lets the DMAP be completely mapped, instead of only
1GB blocks with it assuming being fully mapped.


# 9f5b999a 02-Apr-2018 Justin Hibbits <jhibbits@FreeBSD.org>

Add support for a pmap direct map for 64-bit Book-E

As with AIM64, map the DMAP at the beginning of the fourth "quadrant" of
memory, and move the KERNBASE to the the start of KVA.

Eventually we may run the kernel out of the DMAP, but for now, continue
booting as it has been.


# a029f841 19-Mar-2018 Justin Hibbits <jhibbits@FreeBSD.org>

Fix powerpc Book-E build post-331018/331048.

pagedaemon_wakeup() was moved from vm_pageout.h to vm_pagequeue.h.


# 2c0f13aa 20-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

vm_wait() rework.

Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384


# bce6d88b 17-Feb-2018 Justin Hibbits <jhibbits@FreeBSD.org>

Merge AIM and Book-E PCPU fields

This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel. As more work is done, the struct may be optimized further.

Reviewed by: nwhitehorn


# e958ad4c 12-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273


# e2068d0b 06-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.

Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000


# eb1baf72 28-Jan-2018 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Remove hard-coded trap-handling logic involving the segmented memory model
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().

Reviewed by: jhibbits


# 04329fa7 14-Jan-2018 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Move the pmap-specific code in copyinout.c that gets pointers to userland
buffers into a new pmap-module function pmap_map_user_ptr() that can
be implemented by the respective modules. This is required to implement
non-segment-based AIM-ish MMU systems such as the radix-tree page tables
introduced by POWER ISA 3.0 and present on POWER9.

Reviewed by: jhibbits


# 713e8449 09-Dec-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Retrieve the page outside of holding locks

pmap_track_page() only works with physical memory pages, which have a
constant vm_page_t address. Microoptimize pmap_track_page() to perform one
less operation under the lock.


# 94a9d7c3 07-Dec-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Remove PTE VA mappings for tracked pages in 64-bit mode

This was done in 32-bit mode, but not duplicated when 64-bit mode was
brought in. Without this, stale mappings can be left, leading to odd
crashes when the wrong VA is checked in XX_PhysToVirt() (dpaa(4)).


# 3de971a6 28-Nov-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Only check the page tables if within the KVA.

Devices aren't mapped within the KVA, and with the way 64-bit hashes the
addresses pte_vatopa() may not return a 0 physical address for a device.

MFC after: 1 week


# 71e3c308 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/powerpc: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 41eeef87 26-Nov-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Synchronize TLB1 mappings when created

This allows modules creating mappings to be loaded post-boot, after SMP has
started. Without this, the TLB1 mappings can become unsynchronized and lead
to kernel page faults when accessed on the alternate CPUs.

MFC after: 3 weeks


# 1ccb1458 20-Nov-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Check the page table before TLB1 in pmap_kextract()

The vast majority of pmap_kextract() calls are looking for a physical memory
address, not a device address. By checking the page table first this saves
the formerly inevitable 64 (on e500mc and derivatives) iteration loop
through TLB1 in the most common cases.

Benchmarking this on the P5020 (e5500 core) yields a 300% throughput
improvement on dtsec(4) (115Mbit/s -> 460Mbit/s) measured with iperf.

Benchmarked on the P1022 (e500v2 core, 16 TLB1 entries) yields a 50%
throughput improvement on tsec(4) (~93Mbit/s -> 165Mbit/s) measured with
iperf.

MFC after: 1 week
Relnotes: Maybe (significant performance improvement)


# 2d968f6d 09-Nov-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Properly initialize the full md_page structure


# 06ba753a 09-Nov-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Book-E pmap_mapdev_attr() improvements

* Check TLB1 in all mapdev cases, in case the memattr matches an existing
mapping (doesn't need to be MAP_DEFAULT).
* Fix mapping where the starting address is not a multiple of the widest size
base. For instance, it will now properly map 0xffffef000, size 0x11000 using
2 TLB entries, basing it at 0x****f000, instead of 0x***00000.

MFC after: 2 weeks


# da7266dd 04-Jul-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Remove an obsolete comment

This has been wrong for well over a year, we support the full 36-bit
(or more) PA space.


# d7fd731d 29-Jun-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Use the more common Book-E idiom for disabling interrupts.

Book-E has the wrteei/wrtee instructions for writing the PSL_EE bit, ignoring
all others. Use this instead of the AIM-typical mtmsr.

MFC with: r320392


# 3d135710 26-Jun-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Disable interrupts when updating the TLB

Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.

MFC after: 2 weeks


# e683c328 17-Mar-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Introduce 64-bit PowerPC Book-E support

Extend the Book-E pmap to support 64-bit operation. Much of this was taken from
Juniper's Junos FreeBSD port. It uses a 3-level page table (page directory
list -- PP2D, page directory, page table), but has gaps in the page directory
list where regions will repeat, due to the design of the PP2D hash (a 20-bit gap
between the two parts of the index). In practice this may not be a problem
given the expanded address space. However, an alternative to this would be to
use a 4-level page table, like Linux, and possibly reduce the available address
space; Linux appears to use a 46-bit address space. Alternatively, a cache of
page directory pointers could be used to keep the overall design as-is, but
remove the gaps in the address space.

This includes a new kernel config for 64-bit QorIQ SoCs, based on MPC85XX, with
the following notes:
* The DPAA driver has not yet been ported to 64-bit so is not included in the
kernel config.
* This has been tested on the AmigaOne X5000, using a MD_ROOT compiled in
(total size kernel+mdroot must be under 64MB).
* This can run both 32-bit and 64-bit processes, and has even been tested to run
a 32-bit init with 64-bit children.

Many thanks to stevek and marcel for getting Juniper's FreeBSD patches open
sourced to be used here, and to stevek for reviewing, and providing some
historical contexts on quirks of the code.

Reviewed by: stevek
Obtained from: Juniper (in part)
MFC after: 2 months
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D9433


# 27da2007 20-Feb-2017 Justin Hibbits <jhibbits@FreeBSD.org>

Correct the return value for pmap_change_attr()

pmap_change_attr() returns an error code, not a paddr. This function is
currently unused for powerpc.

MFC after: 2 weeks


# 0a82e6f0 04-Dec-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Use trunc_page() instead of rolling my own in pmap_track_page()


# aa38c69b 03-Dec-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Fix a typo (move parenthesis to correct location in the line).

Before this, it would cause the one consumer of this API in powerpc usage
(dev/dpaa) to set the PTE WIMG flags to empty instead of --M-, making the
cache-enabled buffer portals non-coherent.


# b2f831c0 15-Nov-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Simplify the page tracking for VA<->PA translations.

Drop the tracking down to the pmap layer, with optimizations to only track
necessary pages. This should give a (slight) performance improvement, as well
as a stability improvement, as the tracking is already mostly handled by the
pmap layer.


# 8cb0c102 10-Sep-2016 Alan Cox <alc@FreeBSD.org>

Various changes to pmap_ts_referenced()

Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.

Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.

On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.

Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7836


# bdee3435 06-Sep-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Allow pmap_early_io_unmap() to reclaim memory

pmap_early_io_map()/pmap_early_io_unmap(), if used in pairs, should be used in
the form:

pmap_early_io_map()
..do stuff..
pmap_early_io_unmap()

Without other allocations in the middle. Without reclaiming memory this can
leave large holes in the device space.

While here, make a simple change to the unmap loop which now permits it to unmap
multiple TLB entries in the range.


# dbbaf04f 03-Sep-2016 Mark Johnston <markj@FreeBSD.org>

Remove support for idle page zeroing.

Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714


# 60152a40 23-Aug-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Fix system hang when large FDT is in use

Summary:
Kernel maps only one page of FDT. When FDT is more than one page in size, data
TLB miss occurs on memmove() when FDT is moved to kernel storage
(sys/powerpc/booke/booke_machdep.c, booke_init())

This introduces a pmap_early_io_unmap() to complement pmap_early_io_map(), which
can be used for any early I/O mapping, but currently is only used when mapping
the fdt.

Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7605


# 81ef73fb 22-Aug-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Take into account mas7/8 when reading/writing TLB entries on e6500

Summary: Current booke/pmap code ignores mas7 and mas8 on e6500 CPU.

Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7606


# 791578a8 13-Aug-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Add missing pmap_kenter() method for book-e.

This isn't added to AIM yet, because it's not yet needed. It's needed for
Book-E ePAPR boot support.

X-MFC With: r304047


# 910c0798 26-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/powerpc: make use of the howmany() macro when available.

We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.


# d9c9c81c 21-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: use our roundup2/rounddown2() macros when param.h is available.

rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.


# f60708c9 18-Apr-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Fix SMP booting for PowerPC Book-E

Summary:
PowerPC Book-E SMP is currently broken for unknown reasons. Pull in
Semihalf changes made c2012 for e500mc/e5500, which enables booting SMP.

This eliminates the shared software TLB1 table, replacing it with
tlb1_read_entry() function.

This does not yet support ePAPR SMP booting, and doesn't handle resetting CPUs
already released (ePAPR boot releases APs to a spin loop waiting on a specific
address). This will be addressed in the near future by using the MPIC to reset
the AP into our own alternate boot address.

This does include a change to the dpaa/dtsec(4) driver, to mark the portals as
CPU-private.

Test Plan:
Tested on Amiga X5000/20 (P5020). Boots, prints the following
messages:

Adding CPU 0, pir=0, awake=1
Waking up CPU 1 (dev=1)
Adding CPU 1, pir=20, awake=1
SMP: AP CPU #1 launched

top(1) shows CPU1 active.

Obtained from: Semihalf
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D5945


# f00e9904 10-Apr-2016 Justin Hibbits <jhibbits@FreeBSD.org>

VM_MAXUSER_ADDRESS is highest page start, not highest address.

In case a single page mapping is requested first, which might overlap the user
address space, fix the device map block to the next page.


# f2c3b7f2 10-Apr-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Restructure device mappings for Book-E.

Summary:
There is currently a 1GB hole between user and kernel address spaces
into which direct (1:1 PA:VA) device mappings go. This appears to go largely
unused, leaving all devices to contend with the 128MB block at the end of the
32-bit space (0xf8000000-0xffffffff). This easily fills up, and needs to be
densely packed. However, dense packing wastes precious TLB1 space, of which
there are only 16 (e500v2) or 64(e5500) entries available.

Change this by using the 1GB space for all device mappings, and allow the kernel
to use the entire upper 1GB for KVA. This also allows us to use sparse device
mappings, freeing up TLB entries.

Test Plan: Boot tested on p5020.

Differential Revision: https://reviews.freebsd.org/D5832


# 0aeed3e9 28-Feb-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Add support for the Freescale dTSEC DPAA-based ethernet controller.

Freescale's QorIQ line includes a new ethernet controller, based on their
Datapath Acceleration Architecture (DPAA). This uses a combination of a Frame
manager, Buffer manager, and Queue manager to improve performance across all
interfaces by being able to pass data directly between hardware acceleration
interfaces.

As part of this import, Freescale's Netcomm Software (ncsw) driver is imported.
This was an attempt by Freescale to create an OS-agnostic sub-driver for
managing the hardware, using shims to interface to the OS-specific APIs. This
work was abandoned, and Freescale's primary work is in the Linux driver (dual
BSD/GPL license). Hence, this was imported directly to sys/contrib, rather than
going through the vendor area. Going forward, FreeBSD-specific changes may be
made to the ncsw code, diverging from the upstream in potentially incompatible
ways. An alternative could be to import the Linux driver itself, using the
linuxKPI layer, as that would maintain parity with the vendor-maintained driver.
However, the Linux driver has not been evaluated for reliability yet, and may
have issues with the import, whereas the ncsw-based driver in this commit was
completed by Semihalf 4 years ago, and is very stable.

Other SoC modules based on DPAA, which could be added in the future:
* Security and Encryption engine (SEC4.x, SEC5.x)
* RAID engine

Additional work to be done:
* Implement polling mode
* Test vlan support
* Add support for the Pattern Matching Engine, which can do regular expression
matching on packets.

This driver has been tested on the P5020 QorIQ SoC. Others listed in the
dtsec(4) manual page are expected to work as the same DPAA engine is included in
all.

Obtained from: Semihalf
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing


# 0f7aeab0 27-Feb-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Implement pmap_change_attr() for PowerPC (Book-E only for now)

Summary:
Some drivers need special memory requirements. X86 solves this with a
pmap_change_attr() API, which DRM uses for changing the mapping of the GART and
other memory regions. Implement the same function for PowerPC. AIM currently
does not need this, but will in the future for DRM, so a default is added for
that, for business as usual. Book-E has some drivers coming down that do
require non-default memory coherency. In this case, the Datapath Acceleration
Architecture (DPAA) based ethernet controller has 2 regions for the buffer
portals: cache-inhibited, and cache-enabled. By default, device memory is
cache-inhibited. If the cache-enabled memory regions are mapped
cache-inhibited, an alignment exception is thrown on access.

Test Plan:
Tested with a new driver to be added after this (DPAA dTSEC ethernet driver).
No alignment exceptions thrown, driver works as expected with this.

Reviewed By: nwhitehorn
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5471


# debd17c5 15-Feb-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Fix a panic bug that cropped up in the PTE rewrite.

PTE was getting overwritten by just the flags.

Pointy-hat to: jhibbits


# 64a982ea 11-Feb-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Migrate the PTE format for book-e to standardize on the 'indirect PTE' format

Summary:
The revised Book-E spec, adding the specification for the MMUv2 and e6500,
includes a hardware PTE layout for indirect page tables. In order to support
this in the future, migrate the PTE format to match the MMUv2 hardware PTE
format.

Test Plan: Boot tested on a P5020 board. Booted to multiuser mode.

Differential Revision: https://reviews.freebsd.org/D5224


# 49d36ba4 25-Jan-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Older Book-E processors (e500v1/e500v2) don't support dcbzl.

The only difference between dcbzl and dcbz is dcbzl operates on native cache
line lengths regardless of L1CSR0[DCBZ32]. Since we don't change the cache line
size, the cacheline_size variable will reflect the used cache line length, and
dcbz will work as expected.


# 30857eda 25-Jan-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Fix a debug printf().

Somehow this printf() was missed in the conversion of vm_paddr_t to 64-bit, and
made it through until now.

Sponsored by: Alex Perez/Inertial Computing


# e1a51e19 19-Jan-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Revert a printf change from r294307.

Caused build failures with MPC85XX.

Pointy-hat to: jhibbits


# ca31148c 18-Jan-2016 Justin Hibbits <jhibbits@FreeBSD.org>

Hide most of the PTE initialization and management.

By confining the page table management to a handful of functions it'll be
easier to modify the page table scheme without affecting other functions.
This will be necessary when 64-bit support is added, and page tables become
much larger.


# eabf8946 29-Dec-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Optimize zero_page for book-e mmu.

Instead of indirectly calling bzero() through mmu_booke_zero_page_area, zero the
full page the same way as the AIM pmap logic does: using dcbz.


# 459021cc 29-Dec-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Rewrite tid_flush() in C.

There's no need for it to be in asm. Also, by writing in C, and marking it
static in pmap.c, it saves a branch to the function itself, as it's only used in
one location. The generated asm is virtually identical to the handwritten code.


# b3936ebe 23-Dec-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Extend Book-E to support >4GB RAM

Summary:
With some additional changes for AIM, that could also support much
larger physmem sizes. Given that 32-bit AIM is more or less obsolete, though,
it's not worth it at this time.

Differential Revision: https://reviews.freebsd.org/D4345


# ef596c40 20-Nov-2015 Justin Hibbits <jhibbits@FreeBSD.org>

trunc_page() goes through unsigned long, which is too short.

sizeof(unsigned long) < sizeof(vm_paddr_t) on Book-E, which uses 36-bit
addressing. With this, a CCSR with a physical address above 4GB successfully
maps.

Sponsored by: Alex Perez/Inertial Computing


# 12d44566 29-Aug-2015 Justin Hibbits <jhibbits@FreeBSD.org>

The TLB1 TSIZE is a multiple of 4, not 2, so shift 2 bits, not 1.


# afefc223 27-Aug-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Extend pmap to support e500mc and e5500.

As part of this, clean up tlb1_init(), since bootinfo is always NULL here just
eliminate the loop altogether.

Also, fix a bug in mmu_booke_mapdev_attr() where it's possible to map a larger
immediately following a smaller page, causing the mappings to overlap. Instead,
break up the new mapping into smaller chunks. The downside to this is that it
uses more precious TLB1 entries, which, on smaller chips (e500v2) it could cause
problems with TLB1 being out of space (e500v2 only has 16 TLB1 entries).

Obtained from: Semihalf (partial)
Sponsored by: Alex Perez/Inertial Computing


# b239c24b 22-Aug-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Enhance book-e pmap for 36-bit physaddr

Summary:
This is (probably step 1) of enhancing the book-e pmap to support the full
36-bit physical address space on Freescale e500 and e5500 cores.

Thus far it has only been regression tested on one platform. Since I only have
one other Book-E platform (e5500), that needs work beyond this, I haven't yet
tested it on this.

Test Plan: Regression tested on my RouterBoard RB800.

Reviewed By: marcel
Differential Revision: https://reviews.freebsd.org/D3027


# edc82223 10-Aug-2015 Konstantin Belousov <kib@FreeBSD.org>

Make kstack_pages a tunable on arm, x86, and powepc. On i386, the
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.

The tunable was tested on x86 only. From the visual inspection, it
seems that it might work on arm and powerpc. The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size. I only
changed the macros to use variable instead of constant, since I cannot
test.

On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.

Sponsored by: The FreeBSD Foundation
MFC after: 2 week


# 713841af 04-Aug-2015 Jason A. Harmening <jah@FreeBSD.org>

Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013


# 721555e7 16-Jul-2015 Zbigniew Bodek <zbb@FreeBSD.org>

Fix KSTACK_PAGES issue when the default value was changed in KERNCONF

If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.

Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.

The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.

Reviewed by: kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094


# 0936003e 04-Jul-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Use the correct type for physical addresses.

On Book-E, physical addresses are actually 36-bits, not 32-bits. This is
currently worked around by ignoring the top bits. However, in some cases, the
boot loader configures CCSR to something above the 32-bit mark. This is stage 1
in updating the pmap to handle 36-bit physaddr.


# 6c8df582 29-Apr-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Unify booke and AIM machdep.

Much of the code was common to begin with. There is one nit, which is likely
not an issue at all. With the old code, the AIM machdep would __syncicache()
the entire kernel core at setup. However, in the unified setup, that seems to
hang on the MPC7455, perhaps because it's running later than before. Removing
this allows it to boot just fine. Examining the code, the FreeBSD loader
already does syncicache of the full kernel, and each module loaded, so this
doesn't appear to be an actual problem.

Initial code by Nathan Whitehorn.


# cf0b1c33 28-Mar-2015 Justin Hibbits <jhibbits@FreeBSD.org>

Wrap #ifdef guards around pmap_bootstrap ap. It's only used in SMP, and
building without SMP causes a build failure.

MFC after: 1 month


# 5c845fde 07-Mar-2015 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Make 32-bit PowerPC kernels, like 64-bit PowerPC kernels, position-independent
executables. The goal here, not yet accomplished, is to let the e500 kernel
run under QEMU by setting KERNBASE to something that fits in low memory and
then having the kernel relocate itself at runtime.


# d1295abd 04-Mar-2015 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Move Book-E/AIM dependent bits for setting user PMAP during thread switch
out of cpu_switch() and into pmap_activate() where they belong. This also
removes all the #ifdef from cpu_switch().


# 2c8f60ac 01-Mar-2015 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Missed local diff.


# 3846dae8 01-Mar-2015 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Initialize NX stack capabilities and direct map status in pmap like on AIM.


# bdb9ab0d 06-Jan-2015 Mark Johnston <markj@FreeBSD.org>

Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR: 193873
Differential Revision: https://reviews.freebsd.org/D904
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by: jhibbits (earlier version)
Sponsored by: EMC / Isilon Storage Division


# 39ffa8c1 08-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Change pmap_enter(9) interface to take flags parameter and superpage
mapping size (currently unused). The flags includes the fault access
bits, wired flag as PMAP_ENTER_WIRED, and a new flag
PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep.

For powerpc aim both 32 and 64 bit, fix implementation to ensure that
the requested mapping is created when PMAP_ENTER_NOSLEEP is not
specified, in particular, wait for the available memory required to
proceed.

In collaboration with: alc
Tested by: nwhitehorn (ppc aim32 and booke)
Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division
MFC after: 2 weeks


# a695d9b2 03-Aug-2014 Alan Cox <alc@FreeBSD.org>

Retire pmap_change_wiring(). We have never used it to wire virtual pages.
We continue to use pmap_enter() for that. For unwiring virtual pages, we
now use pmap_unwire(), which unwires a range of virtual addresses instead
of a single virtual page.

Sponsored by: EMC / Isilon Storage Division


# a844c68f 13-Jul-2014 Alan Cox <alc@FreeBSD.org>

Implement pmap_unwire(). See r268327 for the motivation behind this change.


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# a3845e42 01-Feb-2014 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Avoid spurious compiler warning about an uninitialized variable.


# b899f5d5 16-Nov-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Make sure that TLB1 mappings are aligned correctly.


# bdac4360 11-Nov-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Follow up r223485, which made AIM use the ABI thread pointer instead of
PCPU fields for curthread, by doing the same to Book-E. This closes
some potential races switching between CPUs. As a side effect, it turns out
the AIM and Book-E swtch.S implementations were the same to within a few
registers, so move that to powerpc/powerpc.

MFC after: 3 months


# 0f3406e0 06-Nov-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Do not panic if pmap_mincore() is called. This prevents crashing userland
binaries from bringing down the kernel.

MFC after: 3 days


# 0b8a792e 26-Oct-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Make devices with registers into the KVA region work reliably. Without this,
previous KVA allocations (which the PMAP lazily invalidates) in TLB0 could
shadow device maps in TLB1. Add a big block comment about some of the
caveats with this approach.


# 33724f17 26-Oct-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Interrelated improvements to early boot mappings:
- Remove explicit requirement that the SOC registers be found except as an
optimization (although the MPC85XX LAW drivers still require they be found
externally, which should change).
- Remove magic CCSRBAR_VA value.
- Allow bus_machdep.c's early-boot code to handle non 1:1 mappings and
systems not in real-mode or global 1:1 maps in early boot.
- Allow pmap_mapdev() on Book-E to reissue previous addresses if the
area is already mapped. Additionally have it check all mappings, not
just the CCSR area.

This allows the console on e500 systems to actually work on systems where
the boot loader was not kind enough to set up a 1:1 mapping before starting
the kernel.


# 928554a2 26-Oct-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Fix concurrency issues with TLB1 updates and make pmap_kextract() search
TLB1 mappings as well, which is required for the console to work after
r257111.


# 8bcd59f2 26-Oct-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Add pmap_mapdev_attr() and pmap_kenter_attr() interfaces. pmap_set_memattr()
is slightly more complicated and is left unimplemented for now. Also
prevent pmap_mapdev() from mapping over the kernel and KVA regions if
devices happen to have high physical addresses.

MFC after: 2 weeks


# c8d23639 21-Oct-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Make hard-wired TLB allocations be at minimum one page. This is required by
some implementations, most notably (in my case) QEMU's e500 emulation.


# deb179bb 19-Sep-2013 Alan Cox <alc@FreeBSD.org>

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


# e68c64f0 22-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


# c7aebda8 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 5df87b21 07-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 9af6d512 21-May-2013 Attilio Rao <attilio@FreeBSD.org>

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


# 658f180b 17-May-2013 Alan Cox <alc@FreeBSD.org>

Relax the object locking assertion in pmap_enter_locked().

Reviewed by: attilio
Sponsored by: EMC / Isilon Storage Division


# e8a4a618 14-Mar-2013 Konstantin Belousov <kib@FreeBSD.org>

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# dc1558d1 27-Feb-2013 Attilio Rao <attilio@FreeBSD.org>

Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


# a4915c21 26-Feb-2013 Attilio Rao <attilio@FreeBSD.org>

Merge from vmc-playground branch:
Replace the sub-optimal uma_zone_set_obj() primitive with more modern
uma_zone_reserve_kva(). The new primitive reserves before hand
the necessary KVA space to cater the zone allocations and allocates pages
with ALLOC_NOOBJ. More specifically:
- uma_zone_reserve_kva() does not need an object to cater the backend
allocator.
- uma_zone_reserve_kva() can cater M_WAITOK requests, in order to
serve zones which need to do uma_prealloc() too.
- When possible, uma_zone_reserve_kva() uses directly the direct-mapping
by uma_small_alloc() rather than relying on the KVA / offset
combination.

The removal of the object attribute allows 2 further changes:
1) _vm_object_allocate() becomes static within vm_object.c
2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by
direct calls to mtx_init() as there is no need to export it anymore
and the calls aren't either homogeneous anymore: there are now small
differences between arguments passed to mtx_init().

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (which also offered almost all the comments)
Tested by: pho, jhb, davide


# e33d0ab8 03-Nov-2012 Alan Cox <alc@FreeBSD.org>

Replace all uses of the page queues lock by a R/W lock that is private
to this pmap.

Eliminate two redundant #include's.

Tested by: marcel


# 4a49da83 03-Nov-2012 Marcel Moolenaar <marcel@FreeBSD.org>

1. Have the APs initialize the TLB1 entries from what has been
programmed on the BSP during (early) boot. This makes sure
that the APs get configured the same as the BSP, irrspective
of how FreeBSD was loaded.
2. Make sure to flush the dcache after writing the TLB1 entries
to the boot page. The APs aren't part of the coherency domain
just yet.
3. Set pmap_bootstrapped after calling pmap_bootstrap(). The
FDT code now maps the devices (like OF), and this resulted
in a panic.
4. Since we pre-wire the CCSR, make sure not to map chunks of
it in pmap_mapdev().


# 8d9e6d9f 10-Jul-2012 Alan Cox <alc@FreeBSD.org>

Avoid recursion on the pvh global lock in the aim oea pmap.

Correct the return type of the pmap_ts_referenced() implementations.

Reported by: jhibbits [1]
Tested by: andreast


# 816da220 02-Jul-2012 Marcel Moolenaar <marcel@FreeBSD.org>

Invalidate any TLB1 entries we don't need. The firmware (e.g. U-Boot)
may have added entries that conflict with TLB0 entries.


# b504a44a 27-May-2012 Rafal Jaworowski <raj@FreeBSD.org>

Remove redundant check, we catch ULE platform support in common
sys/kern/sched_ule.c


# 20b79612 24-May-2012 Rafal Jaworowski <raj@FreeBSD.org>

Fix physical address type to vm_paddr_t.


# a45d9127 24-May-2012 Marcel Moolenaar <marcel@FreeBSD.org>

o Rename kernload_ap to bp_kernelload. This to introduce a common prefix
for variables that live in the boot page.
o Add bp_trace (yes, it's in the boot page) that gets zeroed before we
try to wake a core and to which the core being woken can write markers
so that we know where the core was in case it doesn't wake up. The
boot code does not yet write markers (too follow).
o Disable the boot page translation to allow the last 4K page to be used
for whatever we please. It would get mapped otherwise.
o Fix kernstart in the case of SMP. The start argument is typically page
aligned due to the alignment requirements that come with having a boot
page. The point of using trunc_page is that we get the actual load
address given that the entry point is immediately following the ELF
headers. In the SMP case this ended up exactly 4K after the load
address. Hence subtracting 1 from start.


# 578113aa 28-Sep-2011 Konstantin Belousov <kib@FreeBSD.org>

Remove locking of the vm page queues from several pmaps, which only
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.

Submitted by: alc
Tested by: flo (sparc64)
MFC after: 2 weeks


# 3407fefe 06-Sep-2011 Konstantin Belousov <kib@FreeBSD.org>

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# d98d0ce2 09-Aug-2011 Konstantin Belousov <kib@FreeBSD.org>

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


# 2b5bf115 02-Aug-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Add support for Juniper's loader. The difference between FreeBSD's and
Juniper's loader is that Juniper's loader maps all of the kernel and
preloaded modules at the right virtual address before jumping into the
kernel. FreeBSD's loader simply maps 16MB using the physical address
and expects the kernel to jump through hoops to relocate itself to
it's virtual address. The problem with the FreeBSD loader's approach is
that it typically maps too much or too little. There's no harm if it's
too much (other than wasting space), but if it's too little then the
kernel will simply not boot, because the first thing the kernel needs
is the bootinfo structure, which is never mapped in that case. The page
fault that early is fatal.

The changes constitute:
1. Do not remap the kernel in locore.S. We're mapped where we need to
be so we can pretty much call into C code after setting up the
stack.
2. With kernload and kernload_ap not set in locore.S, we need to set
them in pmap.c: kernload gets defined when we preserve the TLB1.
Here we also determine the size of the kernel mapped. kernload_ap
is set first thing in the pmap_bootstrap() method.
3. Fix tlb1_map_region() and its use to properly externd the mapped
kernel size to include low-level data structures.

Approved by: re (blanket)
Obtained from: Juniper Networks, Inc


# c7c2767e 16-Jun-2011 Attilio Rao <attilio@FreeBSD.org>

Remove pc_other_cpus and pc_cpumask usage from powerpc support.

Tested and reviewed by: andreast


# d098f930 31-May-2011 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


# ebf84cec 27-May-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Better support different kernel hand-offs. When loaded directly
from U-Boot, the kernel is passed a standard argc/argv pair.
The Juniper loader passes the metadata pointer as the second
argument and passes 0 in the first. The FreeBSD loader passes
the metadata pointer in the first argument.

As such, have locore preserve the first 2 arguments in registers
r30 & r31. Change e500_init() to accept these arguments. Don't
pass global offsets (i.e. kernel_text and _end) as arguments to
e500_init(). We can reference those directly.

Rename e500_init() to booke_init() now that we're changing the
prototype.

In booke_init(), "decode" arg1 and arg2 to obtain the metadata
pointer correctly. For the U-Boot case, clear SBSS and BSS and
bank on having a static FDT for now. This allows loading the
ELF kernel and jumping to the entry point without trampoline.


# 6a76463e 27-May-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Wire the kernel using TLB1 entry 0 rather than entry 1. A more recent
U-Boot as found on the P1020RDB doesn't like it when we use entry 1
(for some reason) whereas an older U-Boot doesn't mind if we use entry
0. If anything else, this simplifies the code a bit.


# c98b3586 18-May-2011 Attilio Rao <attilio@FreeBSD.org>

Revert r222069,222068 as they were intended to be committed to the
largeSMP branch.

Reported by: pluknet


# 1a203896 18-May-2011 Attilio Rao <attilio@FreeBSD.org>

Fix warning spit out.

Reported by: sbruno


# c47dd3db 09-May-2011 Attilio Rao <attilio@FreeBSD.org>

Add the powerpc support.

Note that there is a dirty hack for calling openpic_write(), but
nwhitehorn approved it.

Discussed with: nwhitehorn


# 4053b05b 21-Jan-2011 Sergey Kandaurov <pluknet@FreeBSD.org>

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


# 1d56a280 11-Nov-2010 Rafal Jaworowski <raj@FreeBSD.org>

Use local TLB_UNLOCKED marker instead of MTX_UNOWNED for Book-E PowerPC trap
routines.

This unbreaks Book-E build after the recent machine/mutex.h removal.

While there move tlb_*lock() prototypes to machine/tlb.h.

Submitted by: jhb


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 33529b98 14-Sep-2010 Peter Grehan <grehan@FreeBSD.org>

Introduce inheritance into the PowerPC MMU kobj interface.

include/mmuvar.h - Change the MMU_DEF macro to also create the class
definition as well as define the DATA_SET. Add a macro, MMU_DEF_INHERIT,
which has an extra parameter specifying the MMU class to inherit methods
from. Update the comments at the start of the header file to describe the
new macros.

booke/pmap.c
aim/mmu_oea.c
aim/mmu_oea64.c - Collapse mmu_def_t declaration into updated MMU_DEF macro

The MMU_DEF_INHERIT macro will be used in the PS3 MMU implementation to
allow it to inherit the stock powerpc64 MMU methods.

Reviewed by: nwhitehorn


# d1d3233e 11-Jul-2010 Rafal Jaworowski <raj@FreeBSD.org>

Convert Freescale PowerPC platforms to FDT convention.

The following systems are affected:

- MPC8555CDS
- MPC8572DS

This overhaul covers the following major changes:

- All integrated peripherals drivers for Freescale MPC85XX SoC, which are
currently in the FreeBSD source tree are reworked and adjusted so they
derive config data out of the device tree blob (instead of hard coded /
tabelarized values).

- This includes: LBC, PCI / PCI-Express, I2C, DS1553, OpenPIC, TSEC, SEC,
QUICC, UART, CFI.

- Thanks to the common FDT infrastrucutre (fdtbus, simplebus) we retire
ocpbus(4) driver, which was based on hard-coded config data.

Note that world for these platforms has to be built WITH_FDT.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation


# ed571bca 23-Jun-2010 Marcel Moolenaar <marcel@FreeBSD.org>

Remove debugging printf() -- that is, I assume it was for debugging :-)


# 9124d0d6 11-Jun-2010 Alan Cox <alc@FreeBSD.org>

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


# ce186587 10-Jun-2010 Alan Cox <alc@FreeBSD.org>

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


# 85a71b25 05-Jun-2010 Alan Cox <alc@FreeBSD.org>

Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.

Correct a typo in a nearby comment on sparc64.


# b5bde831 01-Jun-2010 Alan Cox <alc@FreeBSD.org>

In the case that mmu_booke_enter_locked() is changing the attributes of a
mapping but not changing the physical page being mapped, the wrong flags
were being inspected in order to determine whether or not to flush the
instruction cache. The effect of looking at the wrong flags was that the
instruction cache was never being flushed.

Reviewed by: marcel


# c46b90e9 26-May-2010 Alan Cox <alc@FreeBSD.org>

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


# 567e51e1 24-May-2010 Alan Cox <alc@FreeBSD.org>

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


# 9ab6032f 16-May-2010 Alan Cox <alc@FreeBSD.org>

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


# 3c4a2440 08-May-2010 Alan Cox <alc@FreeBSD.org>

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


# c7a0df65 30-Apr-2010 Alan Cox <alc@FreeBSD.org>

MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it.

Additionally, two changes that make this pmap behave like the others do:

Change pmap_protect() such that it calls vm_page_dirty() only if the
page is managed.

Change pmap_remove_write() such that it doesn't clear a page table
entry's accessed bit.


# 2965a453 29-Apr-2010 Kip Macy <kmacy@FreeBSD.org>

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 7b85f591 24-Apr-2010 Alan Cox <alc@FreeBSD.org>

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


# dfeca187 30-Mar-2010 Marcel Moolenaar <marcel@FreeBSD.org>

MFC rev 198341 and 198342:
o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).


# 7733cf8f 11-Feb-2010 Matt Jacob <mjacob@FreeBSD.org>

MFC a number of changes from head for ISP (203478,203463,203444,202418,201758,
201408,201325,200089,198822,197373,197372,197214,196162). Since one of those
changes was a semicolon cleanup from somebody else, this touches a lot more.


# c2ede4b3 07-Jan-2010 Martin Blapp <mbr@FreeBSD.org>

Remove extraneous semicolons, no functional changes.

Submitted by: Marc Balmer <marc@msys.ch>
MFC after: 1 week


# 1a4fcaeb 21-Oct-2009 Marcel Moolenaar <marcel@FreeBSD.org>

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


# 01381811 24-Jul-2009 John Baldwin <jhb@FreeBSD.org>

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


# 50c202c5 23-Jun-2009 Jeff Roberson <jeff@FreeBSD.org>

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


# cd2b3416 13-Jun-2009 Alan Cox <alc@FreeBSD.org>

Correct the method of waking the page daemon when the number of allocated
pv entries surpasses the high water mark. The problem was that the page
daemon would only be awakened the first time that the high water mark was
surpassed. (The variable "pagedaemon_waken" is a non-working vestige of
FreeBSD 4.x, in which it was external and reset by the page daemon whenever
it ran. This reset allowed subsequent wakeups by the pv entry allocator.)


# 661ee6ee 13-Jun-2009 Rafal Jaworowski <raj@FreeBSD.org>

Fix Book-E/MPC85XX build. Some prototypes were wrong and got revealed with
the recent kobj signature checking.


# 29794416 05-Jun-2009 Rafal Jaworowski <raj@FreeBSD.org>

Fill PTEs covering kernel code and data.

Without this fix pte_vatopa() was not able to retrieve physical address of
data structures inside kernel, for example EFAULT was reported while acessing
/dev/kmem ('netstat -nr').

Submitted by: Piotr Ziecik
Obtained from: Semihalf


# 81619265 26-May-2009 Rafal Jaworowski <raj@FreeBSD.org>

Set PG_WRITEABLE in Book-E pmap_enter[_locked] if it creates a mapping that
permits write access. This is similar to r192671.

Pointed out and reviewed by: alc


# 28bb01e5 21-May-2009 Rafal Jaworowski <raj@FreeBSD.org>

Initial support for SMP on PowerPC MPC85xx.

Tested with Freescale dual-core MPC8572DS development system.

Obtained from: Freescale, Semihalf


# b40ce02a 13-May-2009 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Factor out platform dependent things unrelated to device drivers into a
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.

Reviewed by: marcel, raj
Book-E parts by: raj


# d6a8fa05 23-Apr-2009 Marcel Moolenaar <marcel@FreeBSD.org>

Remove PTE_ISFAKE. While here remove code
between "#if 0" and "#endif".


# 48d6f243 04-Apr-2009 Marcel Moolenaar <marcel@FreeBSD.org>

Implement kernel core dump support for Book-E processors.
Both raw physical memory dumps and virtual minidumps are
supported. The default being minidumps.

Obtained from: Juniper Networks


# 802cb57e 28-Feb-2009 Ed Schouten <ed@FreeBSD.org>

Add memmove() to the kernel, making the kernel compile with Clang.

When copying big structures, LLVM generates calls to memmove(), because
it may not be able to figure out whether structures overlap. This caused
linker errors to occur. memmove() is now implemented using bcopy().
Ideally it would be the other way around, but that can be solved in the
future. On ARM we don't do add anything, because it already has
memmove().

Discussed on: arch@
Reviewed by: rdivacky


# 0f31d4ea 13-Jan-2009 Rafal Jaworowski <raj@FreeBSD.org>

Clean up BookE pmap.

Improve comments, eliminate redundant debug output, fix style(9) and other
minor tweaks for code readability.

Obtained from: Freescale, Semihalf


# b2b734e7 13-Jan-2009 Rafal Jaworowski <raj@FreeBSD.org>

Rework BookE pmap towards multi-core support.

o Eliminate tlb0[] (a s/w copy of TLB0)
- The table contents cannot be maintained reliably in multiple MMU
environments, where asynchronous events (invalidations from other cores)
can change our local TLB0 contents underneath.
- Simplify and optimize TLB flushing: system wide invalidations are
performed using tlbivax instruction (propagates to other cores), for
local MMU invalidations a new optimized routine (assembly) is introduced.

o Improve and simplify TID allocation and management.
- Let each core keep track of its TID allocations.
- Simplify TID recycling, eliminate dead code.
- Drop the now unused powerpc/booke/support.S file.

o Improve page tables management logic.

o Simplify TLB1 manipulation routines.

o Other improvements and polishing.

Obtained from: Freescale, Semihalf


# 9b85e175 24-Oct-2008 Marcel Moolenaar <marcel@FreeBSD.org>

In mmu_booke_mapdev(), handle mappings that cannot be represented
by a single TLB entry. The boot ROM on the MPC85555CDS is 8MB, for
example, and in order to map that we need 2 4MB TLB entries.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# c8e78079 28-Aug-2008 Rafal Jaworowski <raj@FreeBSD.org>

Move initialization of tlb0, ptbl_bufs and kernel_pdir regions after we are
100% sure that TLB1 mapping covers for them; previously we could lock the CPU
with an untranslated references.

Obtained from: Semihalf


# 959aea56 26-Aug-2008 Rafal Jaworowski <raj@FreeBSD.org>

Improve kernel stack handling on e500.

- Allocate thread0.td_kstack in pmap_bootstrap(), provide guard page
- Switch to thread0.td_kstack as soon as possible i.e. right after return
from e500_init() and before mi_startup() happens
- Clean up temp stack area
- Other minor cosmetics in machdep.c

Obtained from: Semihalf


# b390a5ba 11-Jun-2008 Wojciech A. Koszek <wkoszek@FreeBSD.org>

Fix a typo in a comment.


# 1ec1304b 17-May-2008 Alan Cox <alc@FreeBSD.org>

Retire pmap_addr_hint(). It is no longer used.


# b66bd41d 27-Apr-2008 Marcel Moolenaar <marcel@FreeBSD.org>

Eliminate track_modified_needed(), better known as pmap_track_modified()
on other platforms. We no longer need it because we do not create managed
mappings within the clean submap.

Pointed out by: alc


# 6b7ba544 03-Mar-2008 Rafal Jaworowski <raj@FreeBSD.org>

Initial support for Freescale PowerQUICC III MPC85xx system-on-chip family.

The PQ3 is a high performance integrated communications processing system
based on the e500 core, which is an embedded RISC processor that implements
the 32-bit Book E definition of the PowerPC architecture. For details refer
to: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8555E

This port was tested and successfully run on the following members of the PQ3
family: MPC8533, MPC8541, MPC8548, MPC8555.

The following major integrated peripherals are supported:

* On-chip peripherals bus
* OpenPIC interrupt controller
* UART
* Ethernet (TSEC)
* Host/PCI bridge
* QUICC engine (SCC functionality)

This commit brings the main functionality and will be followed by individual
drivers that are logically separate from this base.

Approved by: cognet (mentor)
Obtained from: Juniper, Semihalf
MFp4: e500