History log of /freebsd-current/sys/riscv/riscv/pmap.c
Revision Date Author Comments
# d7adf3b4 15-May-2024 Mitchell Horne <mhorne@FreeBSD.org>

riscv: fix L0 PTE setup (Sv48)

Per the Privilege Spec, the Accessed (A) or Dirty (D) bits must only be
set for a leaf PTE.

It seems newer versions of QEMU have started to enforce this
requirement, and without this change, pmap_bootstrap() hangs when
switching to Sv48 mode.

Reviewed by: jrtc27, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45210


# dd03eafa 21-Apr-2024 Doug Moore <dougm@FreeBSD.org>

riscv: create a convenience composite macro

Define PTE_TO_VM_PAGE to compose the PHYS_TO_VM_PAGE and PTE_TO_PHYS
macros. Use it where appropriate, and drop some variables that it
makes unnecessary.

Reviewed by: jhb (previous version)
Differential Revision: https://reviews.freebsd.org/D44700


# 1f1b2286 31-Jan-2024 John Baldwin <jhb@FreeBSD.org>

pmap: Convert boolean_t to bool.

Reviewed by: kib (older version)
Differential Revision: https://reviews.freebsd.org/D39921


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# d0941ed9 08-Nov-2023 Bojan Novković <bojan.novkovic@fer.hr>

riscv: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping

Let pmap_enter_l2() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is the last pmap which requires
such a change ahead of reverting commit 64087fd7f372.

Reviewed by: markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41633


# 368b9736 03-Nov-2023 Mark Johnston <markj@FreeBSD.org>

riscv: Update a variable name to match a comment

This makes pmap_insert_pt_page() consistent with arm64 and amd64. No
functional change intended.

Reported by: alc
Fixes: 7703ac2e983b ("riscv: Port improvements from arm64/amd64 pmaps, part 1")


# 71b77a71 02-Nov-2023 Mark Johnston <markj@FreeBSD.org>

riscv: Remove unnecessary invalidations in pmap_enter_quick_locked()

This function always overwrites an invalid PTE, so if
pmap_try_insert_pv_entry() fails it is certainly not necessary to
invalidate anything, because the PTE has not yet been written by that
point.

It should also not be necessary to invalidate TLBs after overwriting an
invalid entry. In principle the TLB could cache negative entries, but
then the worst case scenario is a spurious fault. Since pmap_enter()
does not bother issuing an sfence.vma, pmap_enter_quick_locked() should
behave similarly.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D42291


# 0b8372b7 02-Nov-2023 Mark Johnston <markj@FreeBSD.org>

riscv: Port improvements from arm64/amd64 pmaps, part 3

- Let pmap_enter_quick_locked() trigger superpage promotions.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D42290


# 3c4f46b0 02-Nov-2023 Mark Johnston <markj@FreeBSD.org>

riscv: Port improvements from arm64/amd64 pmaps, part 2

- Give pmap_promote_l2() a return value indicating whether or not
promotion succeeded.
- Check pmap_ps_enabled() in pmap_promote_l2() rather than making
callers do it.
- Annotate superpages_enabled with __read_frequently.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D42289


# 7703ac2e 02-Nov-2023 Mark Johnston <markj@FreeBSD.org>

riscv: Port improvements from arm64/amd64 pmaps, part 1

- When promoting, do not require that all PTEs all have PTE_A set.
Instead, record whether they did and store this information in the
PTP's valid bits.
- Synchronize some comments in pmap_promote_l2().
- Make pmap_promote_l2() scan starting from the end of the 2MB range
instead of the beginning. See the commit log for 9d1b7fa31f510 for
justification of this, which I believe applies here as well.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D42288


# 95334592 02-Nov-2023 Mark Johnston <markj@FreeBSD.org>

riscv: Retire PMAP_INLINE

pmap_kremove() is not called from within pmap.c, so there's no reason to
inline it. No functional change intended.

Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42287


# 74e4a8d2 22-Aug-2023 Mina Galić <freebsd@igalic.co>

pmap: add pmap_kextract(9) man page

Add a man page for pmap_kextract(9), with alias to vtophys(9). This man
page is based on pmap_extract(9).

Add it as cross reference in pmap(9), and add comments above the
function implementations.

Co-authored-by: Graham Perrin <grahamperrin@gmail.com>
Co-authored-by: mhorne
Sponsored by: The FreeBSD Foundation
Pull Request: https://github.com/freebsd/freebsd-src/pull/827


# 8882b785 07-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

add pmap_active_cpus()

For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32360


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 29edff0d 16-Jul-2023 Alan Cox <alc@FreeBSD.org>

arm64/riscv pmap: Initialize the pmap's pm_pvchunk field

I believe that there are two reasons that the missing TAILQ
initialization operations haven't caused a problem. First, the TAILQ
head's first field is being initialized to zeroes elsewhere. Second,
the first access to the TAILQ head's last field is by
TAILQ_INSERT_HEAD(), which assigns to the last field without reading
it when the first field is NULL.

Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41118


# b8cc13fa 16-Jul-2023 Doug Moore <dougm@FreeBSD.org>

riscv pmap: another vm_radix_init

pmap_pinit0 also needs to initialize a vm_radix, in case vm_radix_init
does anything but zeroing fields.

Reported by: alc
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D41055


# 3e04ae43 14-Jul-2023 Doug Moore <dougm@FreeBSD.org>

vm_radix_init: use initializer

Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.

Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D40971


# 73cc3dbc 25-May-2023 John Baldwin <jhb@FreeBSD.org>

riscv pmap: Add an __unused wrapper for a variable only used under PV_STATS.


# ef0a711f 25-May-2023 Alfredo Mazzinghi <am2419@cam.ac.uk>

riscv: Use PMAP_MAPDEV_EARLY_SIZE in locore and pmap_bootstrap

Use PMAP_MAPDEV_EARLY_SIZE instead of assuming that its value is always
L2_SIZE. Add compile-time assertions to check that the size matches the
expectations in locore.

Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D40110


# 7245ffd1 22-May-2023 Mitchell Horne <mhorne@FreeBSD.org>

riscv: MMU detection

Detect and report the supported MMU for each CPU. Export the
capabilities to the rest of the kernel and use it in pmap_bootstrap() to
check for Sv48 support.

Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39814


# 4961faaa 04-May-2023 John Baldwin <jhb@FreeBSD.org>

pmap_{un}map_io_transient: Use bool instead of boolean_t.

Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D39920


# 4d90a5af 07-Oct-2022 John Baldwin <jhb@FreeBSD.org>

sys: Consolidate common implementation details of PV entries.

Add a <sys/_pv_entry.h> intended for use in <machine/pmap.h> to
define struct pv_entry, pv_chunk, and related macros and inline
functions.

Note that powerpc does not yet use this as while the mmu_radix pmap
in powerpc uses the new scheme (albeit with fewer PV entries in a
chunk than normal due to an used pv_pmap field in struct pv_entry),
the Book-E pmaps for powerpc use the older style PV entries without
chunks (and thus require the pv_pmap field).

Suggested by: kib
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36685


# 1f9cc5ff 05-Oct-2022 Mitchell Horne <mhorne@FreeBSD.org>

riscv: handle kernel PTE edge-case in pmap_enter_l2()

Page table pages are never freed from the kernel pmap, instead they are
zeroed when a range is unmapped. This allows future mappings to be
constructed more quickly. Detect this scenario in pmap_enter_l2(), so we
don't fail to create a superpage mapping when the 2MB range is actually
available.

Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36885


# 99fe5237 25-Aug-2022 Mitchell Horne <mhorne@FreeBSD.org>

riscv: add an assert to pmap_remove_pages()

Similar checks exist for both arm64 and amd64, but note that for amd64
it is a bare panic().

Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36564


# 9d1aef84 05-Oct-2022 Mitchell Horne <mhorne@FreeBSD.org>

riscv: handle superpage in pmap_enter_quick_locked()

Previously, if pmap_enter_l2() was asked to re-map an existing superpage
(the result of madvise(MADV_WILLNEED) on a mapped range), it could
'fail' to do so, falling back to trying pmap_enter_quick_locked() for
each 4K virtual page. Because this function does not check if the l2
entry it finds is a superpage, it would proceed, sometimes resulting in
the creation of false PV entries.

If the relevant range was later munmap'ed, the system would panic during
the process' exit in pmap_remove_pages(), while attempting to clean up
the PV entries for mappings which no longer exist.

Instead, we should return early in the presence of an existing
superpage, as is done in other pmaps.

PR: 266108
Reviewed by: markj, alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36563


# 95b1c270 05-Oct-2022 Mitchell Horne <mhorne@FreeBSD.org>

riscv: optimize MADV_WILLNEED on existing superpages

Specifically, avoid pointless calls to pmap_enter_quick_locked() when
madvise(MADV_WILLNEED) is applied to an existing superpage mapping.

1d5ebad06c20 made the change for amd64 and arm64.

Reviewed by: markj, alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36563


# dd18b62c 25-Aug-2022 Mitchell Horne <mhorne@FreeBSD.org>

riscv: better CTR messages in pmap_enter_l2()

Disambiguate the failure cases.

Reviewed by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D36562


# d5dc278e 03-Oct-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Apply 8d7ee2047c5e to the riscv pmap

Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36840


# ec21f85a 29-Sep-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Handle invalid L2 entries in pmap_extract()

While here, eliminate a single-use local variable.

PR: 266103
Reviewed by: mhorne
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36395


# f49fd63a 22-Sep-2022 John Baldwin <jhb@FreeBSD.org>

kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.

Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36549


# 7ae99f80 22-Sep-2022 John Baldwin <jhb@FreeBSD.org>

pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.

This matches the return type of pmap_mapdev/bios.

Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548


# e6639073 23-Aug-2022 John Baldwin <jhb@FreeBSD.org>

Define _NPCM and the last PC_FREEn constant in terms of _NPCPV.

This applies one of the changes from
5567d6b4419b02a2099527228b1a51cc55a5b47d to other architectures
besides arm64.

Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36263


# 828ea49d 28-Jul-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Avoid passing invalid addresses to pmap_fault()

After the addition of SV48 support, VIRT_IS_VALID() did not exclude
addresses that are in the SV39 address space hole but not in the SV48
address space hole. This can result in mishandling of accesses to that
range when in SV39 mode.

Fix the problem by modifying VIRT_IS_VALID() to use the runtime address
space bounds. Then, if the address is invalid, and pcb_onfault is set,
give vm_fault_trap() a chance to veto the access instead of panicking.

PR: 265439
Reviewed by: jhb
Reported and tested by: Robert Morris <rtm@lcs.mit.edu>
Fixes: 31218f3209ac ("riscv: Add support for enabling SV48 mode")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35952


# a56881d3 13-Apr-2022 John Baldwin <jhb@FreeBSD.org>

riscv: Use __diagused for variables only used in KASSERT().


# 8a0339e6 06-Apr-2022 Mitchell Horne <mhorne@FreeBSD.org>

riscv: eliminate physmap global

Since physical memory management is now handled by subr_physmem.c, the
need to keep this global array has diminished. It is not referenced
outside of early boot-time, and is populated by physmem_avail() in
pmap_bootstrap(). Just allocate the array on the stack for the duration
of its lifetime.

The check against physmap[0] in initriscv() can be dropped altogether,
as there is no consequence for excluding a memory range twice.

Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34778


# 31218f32 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Add support for enabling SV48 mode

This increases the size of the user map from 256GB to 128TB. The kernel
map is left unchanged for now.

For now SV48 mode is left disabled by default, but can be enabled with a
tunable. Note that extant hardware does not implement SV48, but QEMU
does.

- In pmap_bootstrap(), allocate a L0 page and attempt to enable SV48
mode. If the write to SATP doesn't take, the kernel continues to run
in SV39 mode.
- Define VM_MAX_USER_ADDRESS to refer to the SV48 limit. In SV39 mode,
the region [VM_MAX_USER_ADDRESS_SV39, VM_MAX_USER_ADDRESS_SV48] is not
mappable.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34280


# 6ce716f7 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Add support for dynamically allocating L1 page table pages

This is required in SV48 mode.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34279


# 13211172 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Handle four-level page tables in various pmap traversal routines

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34278


# ceed6148 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Maintain the allpmaps list only in SV39 mode

When four-level page tables are used, there is no need to distribute
updates to the top-level page to all pmaps.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34277


# 5cf3a821 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Add pmap helper functions required by four-level page tables

No functional change intended.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34276


# 59f192c5 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Add various pmap definitions needed to support SV48 mode

No functional change intended.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34272


# 2e956c30 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Use generic CSR macros for writing SATP

Instead of having the one-off load_satp(), just use csr_write(). No
functional change intended.

Reviewed by: alc, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34271


# 82f4e0d0 01-Mar-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Rename struct pmap's pm_l1 field to pm_top

In SV48 mode, the top-level page will be an L0 page rather than an L1
page. Rename the field accordingly. No functional change intended.

Reviewed by: alc, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34270


# d5c0a7b6 22-Feb-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Fix another race in pmap_pinit()

Commit c862d5f2a789 ("riscv: Fix a race in pmap_pinit()") did not really
fix the race. Alan writes,

Suppose that N entries in the L1 tables are in use, and we are in the
middle of the memcpy(). Specifically, we have read the zero-filled
(N+1)st entry from the kernel L1 table. Then, we are preempted. Now,
another core/thread does pmap_growkernel(), which fills the (N+1)st
entry. Finally, we return to the original core/thread, and overwrite
the valid entry with the zero that we earlier read.

Try to fix the race properly, by copying kernel L1 entries while holding
the allpmaps lock. To avoid doing unnecessary work while holding this
global lock, copy only the entries that we expect to be valid.

Fixes: c862d5f2a789 ("riscv: Fix a race in pmap_pinit()")
Reported by: alc, jrtc27
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34267


# c862d5f2 08-Feb-2022 Mark Johnston <markj@FreeBSD.org>

riscv: Fix a race in pmap_pinit()

All pmaps share the top half of the address space. With 3-level page
tables, the top-level kernel map entries are not static: they might
change if the kernel map is extended (via pmap_growkernel()) or a 1GB
mapping in the direct map is demoted (not implemented yet). Thus the
riscv pmap maintains the allpmaps list to synchronize updates to
top-level entries.

When a pmap is created, it is inserted into this list after copying
top-level entries from the kernel pmap. The copying is done without
holding the allpmaps lock, and it is possible for pmap_pinit() to race
with kernel map updates. In particular, if a thread is modifying L1
entries, and a concurrent pmap_pinit() copies the old version of the
entries, it might not receive the update.

Fix the problem by copying the kernel map entries after inserting the
pmap into the list. This ensures that the nascent pmap always receives
updates, though pmap_distribute_l1() may race with the page copy.

Reviewed by: mhorne, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34158


# e161dfa9 24-Dec-2021 Alan Cox <alc@FreeBSD.org>

Fix pmap_is_prefaultable() on arm64 and riscv

The current implementations never correctly return TRUE. In all cases,
when they currently return TRUE, they should have returned FALSE. And,
in some cases, when they currently return FALSE, they should have
returned TRUE. Except for its effects on performance, specifically,
additional page faults and pointless calls to pmap_enter_quick() that
abort, this error is harmless. That is why it has gone unnoticed.

Add a comment to the amd64, arm64, and riscv implementations
describing how their return values are computed.

Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33659


# 84c39222 19-Oct-2021 Mark Johnston <markj@FreeBSD.org>

Convert consumers to vm_page_alloc_noobj_contig()

Remove now-unneeded page zeroing. No functional change intended.

Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32006


# a4667e09 19-Oct-2021 Mark Johnston <markj@FreeBSD.org>

Convert vm_page_alloc() callers to use vm_page_alloc_noobj().

Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986


# 682c00a6 17-Oct-2021 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Implement pmap_mapdev_attr

This is needed for LinuxKPI's _ioremap_attr. This reuses the generic
implementation introduced for aarch64, and itself requires implementing
pmap_kenter, which is trivial to do given riscv currently treats all
mapping attributes the same due to the Svpbmt extension not yet being
ratified and in hardware.

Reviewed by: markj, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32445


# 4a9f2f8b 07-Oct-2021 Mitchell Horne <mhorne@FreeBSD.org>

riscv: handle page faults in the unmappable region

When handling a kernel page fault, check explicitly that stval resides
in either the user or kernel address spaces, and make the page fault
fatal if not. Otherwise, a properly crafted address may appear to
pmap_fault() as a valid and present page in the kernel map, causing the
page fault to be retried continuously. This is mainly due to the fact
that the upper bits of virtual addresses are not validated by most of
the pmap code.

Faults of this nature should only occur due to some kind of bug in the
kernel, but it is best to handle them gracefully when they do.

Handle user page faults in the same way, sending a SIGSEGV immediately
when a malformed address is encountered.

Add an assertion to pmap_l1(), which should help catch other bugs of
this kind that make it this far.

Reviewed by: jrtc27, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31208


# 1be2e16d 03-Oct-2021 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Add a stub pmap_change_attr implementation

pmap_change_attr is required by drm-kmod so we need the function to
exist. Since the Svpbmt extension is on the horizon we will likely end
up with a real implementation of it, so this stub implementation does
all the necessary page table walking to validate the input, ensuring
that no new errors are returned once it's implemented fully (other than
due to out of memory conditions when demoting L2 entries) and providing
a skeleton for that future implementation.

Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31996


# 98138bbd 09-Aug-2021 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Fix pmap_alloc_l2 when it should allocate a new L1 entry

The current code checks the RWX bits are 0 but does not check the V bit
is non-zero, meaning not-yet-allocated L1 entries that are still zero
are regarded as being allocated. This is likely due to copying the arm64
code that checks ATTR_DESC_MASK is L1_TABLE, which emcompasses both the
type and the validity in a single field, and erroneously translating
that to a check of just PTE_RWX being 0 to indicate non-leaf, forgetting
about the V bit. This then results in the following panic:

panic: Fatal page fault at 0xffffffc0005cf292: 0x00000000000050
cpuid = 1
time = 1628379581
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
page_fault_handler() at page_fault_handler+0x1ba
do_trap_supervisor() at do_trap_supervisor+0x7a
cpu_exception_handler_supervisor() at
cpu_exception_handler_supervisor+0x70
--- exception 13, tval = 0x50
pmap_enter_l2() at pmap_enter_l2+0xb2
pmap_enter_object() at pmap_enter_object+0x15e
vm_map_pmap_enter() at vm_map_pmap_enter+0x228
vm_map_insert() at vm_map_insert+0x4ec
vm_map_find() at vm_map_find+0x474
vm_map_find_min() at vm_map_find_min+0x52
vm_mmap_object() at vm_mmap_object+0x1ba
vn_mmap() at vn_mmap+0xf8
kern_mmap() at kern_mmap+0x4c4
sys_mmap() at sys_mmap+0x38
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x1dd

Instead, we should just check the V bit, as on amd64, and assert that
any valid L1 entries are not leaves, since an L1 leaf would render the
entire range allocated and thus we should not have attempted to map that
VA in the first place.

Reported by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Reviewed by: markj, mhorne
Differential Revision: https://reviews.freebsd.org/D31460


# 4a235049 22-Jul-2021 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Fix pmap_kextract racing with concurrent superpage promotion/demotion

This repeats amd64's cfcbf8c6fd3b (r180498) and i386's cf3508519c5e
(r202894) but for riscv; pmap_kextract must be lock-free and so it can
race with superpage promotion and demotion, thus the L2 entry must only
be loaded once to avoid using inconsistent state.

PR: 250866
Reviewed by: markj, mhorne
Tested by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31253


# ade2ea3c 20-Jul-2021 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Fix pindex level confusion

The pindex values are assigned from the L3 leaves upwards, meaning there
are NUL2E L3 tables and then NUL1E L2 tables (with a futher NUL0E L1
tables in future when we implement Sv48 support). Therefore anything
below NUL2E is an L3 table's page and anything above or equal to NUL2E
is an L2 table's page (with the threshold of NUL2E + NUL1E marking the
start of the L1 tables' pages in Sv48). Thus all the comparisons and
arithmetic operations must use NUL2E to handle the L3/L2 allocation (and
thus L2/L1 entry) transition point, not NUL1E as all but pmap_alloc_l2
were doing.

To make matters confusing, the NUL1E and NUL2E definitions in the RISC-V
pmap are based on a 4-level page hierarchy but we currently use the
3-level Sv39 format (as that's the only required one, and hardware
support for the 4-level Sv48 is not widespread). This means that, in
effect, the above bug cancels out with the bloated NULxE definitions
such that things "work" (but are still technically wrong, and thus would
break when adding Sv48 support), with one exception. pmap_enter_l2 is
currently the only function to use the correct constant, but since
_pmap_alloc_l3 uses the incorrect constant, it will do complete nonsense
when it needs to allocate a new L2 table (which is rather rare). In this
instance, _pmap_alloc_l3, whilst it would correctly determine the pindex
was for an L2 table, would only subtract NUL1E when computing l1index
and thus go way out of bounds (by 511*512*512 bytes, or 127.75 GiB) of
its own L1 table and, thanks to pmap_distribute_l1, of every other
pmap's L1 table in the whole system. This has likely never been hit as
it would presumably instantly fault and panic.

Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31087


# b092c58c 14-Jul-2021 Mark Johnston <markj@FreeBSD.org>

Assert that valid PTEs are not overwritten when installing a new PTP

amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.

Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31171


# 317113bb 06-Jun-2021 Mark Johnston <markj@FreeBSD.org>

riscv: Rename pmap_fault_fixup() to pmap_fault()

This is consistent with other platforms, specifically arm and arm64. No
functional change intended.

Reviewed by: jrtc27
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30645


# c05748e0 06-Jun-2021 Mark Johnston <markj@FreeBSD.org>

riscv: Handle hardware-managed dirty bit updates in pmap_promote_l2()

pmap_promote_l2() failed to handle implementations which set the
accessed and dirty flags. In particular, when comparing the attributes
of a run of 512 PTEs, we must handle the possibility that the hardware
will set PTE_D on a clean, writable mapping.

Following the example of amd64 and arm64, change riscv's
pmap_promote_l2() to downgrade clean, writable mappings to read-only, so
that updates are synchronized by the pmap lock.

Fixes: f6893f09d
Reported by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Tested by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: jrtc27, alc, Nathaniel Filardo
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30644


# 67932460 17-Feb-2021 John Baldwin <jhb@FreeBSD.org>

Add a VA_IS_CLEANMAP() macro.

This macro returns true if a provided virtual address is contained
in the kernel's clean submap.

In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28710


# 0628f683 04-Nov-2020 mhorne <mhorne@FreeBSD.org>

riscv pmap: add some pv list assertions

Ensure that we don't end up with a superpage in the vm_page_t's pv list.

This may help with debugging the panic reported in PR 250866, in which
l3 in pmap_remove_write() was found to be NULL. Adding a KASSERT to this
function will help narrow down the cause of this panic the next time it
occurs.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D28109


# 1dce7d9e 18-Dec-2020 John Baldwin <jhb@FreeBSD.org>

Skip the vm.pmap.kernel_maps sysctl by default.

This sysctl node can generate very verbose output, so don't trigger it
for sysctl -a or sysctl vm.pmap.

Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D27504


# caaddb88 04-Nov-2020 Mitchell Horne <mhorne@FreeBSD.org>

riscv: set kernel_pmap hart mask more precisely

In pmap_bootstrap(), we fill kernel_pmap->pm_active since it is
invariably active on all harts. However, this marks it as active even
for harts that don't exist in the system, which can cause issue when the
mask is passed to the SBI firmware via sbi_remote_sfence_vma().
Specifically, the SBI spec allows SBI_ERR_INVALID_PARAM to be returned
when an invalid hart is set in the mask.

The latest version of OpenSBI does not have this issue, but v0.6 does,
and this is triggering a recently added KASSERT in CI. Switch to only
setting bits in pm_active for harts that enter the system.

Reported by: Jenkins
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27080


# 02a37049 17-Oct-2020 Mitchell Horne <mhorne@FreeBSD.org>

riscv: zero reserved PTE bits for L2 PTEs

As was done for L3 PTEs in r362853, mask out the reserved bits when
extracting the physical address from an L2 PTE. Future versions of the
spec or custom implementations may make use of these reserved bits, in
which case the resulting physical address could be incorrect.

Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: kp, mhorne
Differential Revision: https://reviews.freebsd.org/D26607


# 6f3b523c 14-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

Avoid dump_avail[] redefinition.

Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.

Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741


# 7c7b8f57 08-Sep-2020 Mitchell Horne <mhorne@FreeBSD.org>

RISC-V: fix some mismatched format specifiers

RISC-V is currently built with -Wno-format, which is how these went
undetected. Address them now before re-enabling those warnings.

Differential Revision: https://reviews.freebsd.org/D26319


# 847ab36b 02-Sep-2020 Mark Johnston <markj@FreeBSD.org>

Include the psind in data returned by mincore(2).

Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.

The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.

For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.

Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238


# e91d4ae8 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

riscv: clean up empty lines in .c and .h files


# b865714d 01-Jul-2020 Kristof Provost <kp@FreeBSD.org>

riscv pmap: zero reserved pte bits in ppn

The top 10 bits of a pte are reserved by specification[1] and are not part of
the PPN.

[1] 'Volume II: RISC-V Privileged Architectures V20190608-Priv-MSU-Ratified',
'4.4.1 Addressing and Memory Protection', page 72: "The PTE format for Sv39 is
shown in Figure 4.18. ... Bits 63–54 are reserved for future use and must be
zeroed by software for forward compatibility."

Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: kp, mhorne
Differential Revision: https://reviews.freebsd.org/D25523


# 133b1f14 24-Jun-2020 Mitchell Horne <mhorne@FreeBSD.org>

Only invalidate the early DTB mapping if it exists

This temporary mapping will become optional. Booting via loader(8)
means that the DTB will have already been copied into the kernel's
staging area, and is therefore covered by the early KVA mappings.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D24911


# f7910a3d 08-Jun-2020 Alex Richardson <arichardson@FreeBSD.org>

sys/riscv: Remove debug printfs

They are only visible with EARLY_PRINTF so don't show up by default.

Reviewed By: mhorne
Differential Revision: https://reviews.freebsd.org/D25152


# 0721214a 13-May-2020 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Fix pmap_protect for superpages

When protecting a superpage, we would previously fall through to the
non-superpage case and read the contents of the superpage as PTEs,
potentially modifying them and trying to look up underlying VM pages that
don't exist if they happen to look like PTEs we would care about. This led
to nginx causing an unexpected page fault in pmap_protect that panic'ed the
kernel. Instead, if we see a superpage, we are done for this range and
should continue to the next.

Reviewed by: markj, jhb (mentor)
Approved by: markj, jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D24827


# 820a3f43 18-Apr-2020 Mitchell Horne <mhorne@FreeBSD.org>

RISC-V: use physmem to manage physical memory

Replace our hand-rolled functions with the generic ones provided by
kern/subr_physmem.c. This greatly simplifies the initialization of
physical memory regions and kernel globals.

Tested by: nick
Differential Revision: https://reviews.freebsd.org/D24154


# be4ed3d2 06-Apr-2020 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Add semicolon missing from r359672

Somehow this got lost between build-testing and submitting to Phabricator.


# 24891abd 06-Apr-2020 Mitchell Horne <mhorne@FreeBSD.org>

RISC-V: copy the DTB to early KVA

The location of the device-tree blob is passed to the kernel by the
previous booting stage (i.e. BBL or OpenSBI). Currently, we leave it
untouched and mark the 1MB of memory holding it as unavailable.

Instead, do what is done by other fake_preload_metadata() routines and
copy to the DTB to KVA space. This is more in line with what loader(8)
will provide us in the future, and it allows us to reclaim the hole in
physical memory.

Reviewed by: markj, kp (earlier version)
Differential Revision: https://reviews.freebsd.org/D24152


# 44c27d70 06-Apr-2020 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Make sure local hart's icache is synced in pmap_sync_icache

The only way to flush the local hart's icache is with a FENCE.I (or an
equivalent SBI call); a normal FENCE is insufficient and, for the
single-hart case, unnecessary.

Reviewed by: jhb (mentor), markj
Approved by: jhb (mentor), markj
Differential Revision: https://reviews.freebsd.org/D24317


# 1dc32a6d 06-Apr-2020 Jessica Clarke <jrtc27@FreeBSD.org>

riscv: Fix pmap_fault_fixup for L3 pages

Summary:
The parentheses being in the wrong place means that, for L3 pages,
oldpte has all bits except PTE_V cleared, and so all the subsequent
checks against oldpte will fail, causing us to bail out and not retry
the faulting instruction after an SFENCE.VMA. This causes a WITNESS +
INVARIANTS kernel to fault on the "Chisel P3" (BOOM-based) DARPA SSITH
GFE SoC in pmap_init when writing to pv_table and, being a nofault
entry, subsequently panic with:

panic: vm_fault_lookup: fault on nofault entry, addr: 0xffffffc004e00000

Reviewed by: markj
Approved by: markj
Differential Revision: https://reviews.freebsd.org/D24315


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 9ce0b407 12-Feb-2020 Mitchell Horne <mhorne@FreeBSD.org>

Implement vm.pmap.kernel_maps for RISC-V

This is taken from the arm64 version, with the following simplifications:

- Our current pmap implementation uses a 3-level paging scheme
- The "mode" field has been omitted since RISC-V PTEs don't encode
typical mode attributes

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D23594


# ccfb8acd 12-Feb-2020 Mitchell Horne <mhorne@FreeBSD.org>

RISC-V: un-ifdef vm.kvm_size and vm.kvm_free

Fix formatting and add CTLFLAG_MPSAFE.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D23522


# 5cff1f4d 10-Dec-2019 Mark Johnston <markj@FreeBSD.org>

Introduce vm_page_astate.

This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.

This change merely adds the structure and updates references to atomic
state fields. No functional change intended.

Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650


# 01cef4ca 16-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Remove page locking from pmap_mincore().

After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823


# 638f8678 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(6/6) Convert pmap to expect busy in write related operations now that all
callers hold it.

This simplifies pmap code and removes a dependency on the object lock.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596


# 205be21d 14-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592


# b0fd4615 08-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Avoid erroneously clearing PGA_WRITEABLE in riscv's pmap_enter().

During a CoW fault, we must check for both 4KB and 2MB mappings before
clearing PGA_WRITEABLE on the old mapping's page. Previously we were
only checking for 4KB mappings. This was missed in r344106.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 7992921b 08-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Clear PGA_WRITEABLE in riscv's pmap_remove_l3().

pmap_remove_l3() may remove the last mapping of a page, in which case
it must clear PGA_WRITEABLE.

Reported by: Jenkins, via lwhsu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# d4586dd3 27-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Implement pmap_page_is_mapped() correctly on arm64 and riscv.

We must also check for large mappings. pmap_page_is_mapped() is
mostly used in assertions, so the problem was not very noticeable.

Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21824


# b119329d 25-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Complete the removal of the "wire_count" field from struct vm_page.

Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.

Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768


# e8bcf696 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Revert r352406, which contained changes I didn't intend to commit.


# 41fd4b94 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Fix a couple of nits in r352110.

- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486


# 5d18382b 25-Jul-2019 Alan Cox <alc@FreeBSD.org>

Simplify the handling of superpages in pmap_clear_modify(). Specifically,
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.

Deindent the nearby code.

Reviewed by: kib, markj
Tested by: pho (amd64, i386)
X-MFC after: r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision: https://reviews.freebsd.org/D21027


# cd7795a5 17-Jul-2019 Kristof Provost <kp@FreeBSD.org>

riscv: Return vm_paddr_t in pmap_early_vtophys()

We can't use a u_int to compute the physical address in
pmap_early_vtophys(). Our int is 32-bit, but the physical address is
64-bit. This works fine if everything lives in below 0x100000000, but as
soon as it doesn't this breaks.

MFC after: 1 week
Sponsored by: Axiado


# 1fd21cb0 15-Jul-2019 Mark Johnston <markj@FreeBSD.org>

pmap_clear_modify() needs to clear PTE_W.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 24074d28 15-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Fix reference counting in pmap_ts_referenced() on RISC-V.

pmap_ts_referenced() does not necessarily clear the access bit from
all accessed mappings of a given page. Thus, if a scan of the mappings
needs to be restarted, we should be careful to avoid double-counting
accessed mappings whose access bits were not cleared in a previous
attempt.

Reported by: alc
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20926


# eeacb3b0 08-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended. __FreeBSD_version is bumped.

Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247


# 7fe5c13c 04-Jul-2019 Alan Cox <alc@FreeBSD.org>

Merge r349526 from amd64. When we protect an L3 entry, we only call
vm_page_dirty() when, in fact, we are write protecting the page and the L3
entry has PTE_D set. However, pmap_protect() was always calling
vm_page_dirty() when an L2 entry has PTE_D set. Handle L2 entries the
same as L3 entries so that we won't perform unnecessary calls to
vm_page_dirty().

Simplify the loop calling vm_page_dirty() on L2 entries.


# bffa317f 09-Jun-2019 Mitchell Horne <mhorne@FreeBSD.org>

RISC-V: Announce real and available memory at boot

Most architectures print their total (real) and available memory during
boot. Properly initialize the realmem global and print these messages.
Also print the physical memory chunks (behind a bootverbose flag).

Reviewed by: markj
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20496


# 1fe65d05 08-Jun-2019 Alan Cox <alc@FreeBSD.org>

Correct a new KASSERT() in r348828.

X-MFC with: r348828


# fd2dae0a 08-Jun-2019 Alan Cox <alc@FreeBSD.org>

Implement an alternative solution to the amd64 and i386 pmap problem that we
previously addressed in r348246.

This pmap problem also exists on arm64 and riscv. However, the original
solution developed for amd64 and i386 cannot be used on arm64 and riscv. In
particular, arm64 and riscv do not define a PG_PROMOTED flag in their level
2 PTEs. (A PG_PROMOTED flag makes no sense on arm64, where unlike x86 or
riscv we are required to break the old 4KB mappings before making the 2MB
mapping; and on riscv there are no unused bits in the PTE to define a
PG_PROMOTED flag.)

This commit implements an alternative solution that can be used on all four
architectures. Moreover, this solution has two other advantages. First, on
older AMD processors that required the Erratum 383 workaround, it is less
costly. Specifically, it avoids unnecessary calls to pmap_fill_ptp() on a
superpage demotion. Second, it enables the elimination of some calls to
pagezero() in pmap_kernel_remove_{l2,pde}().

In addition, remove a related stale comment from pmap_enter_{l2,pde}().

Reviewed by: kib, markj (an earlier version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20538


# 88ea538a 07-Jun-2019 Mark Johnston <markj@FreeBSD.org>

Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).

These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.

Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470


# b803d0b7 12-May-2019 Ruslan Bukin <br@FreeBSD.org>

Add support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
from SiFive, Inc.

The first core on this SoC (hart 0) is a 64-bit microcontroller.

o Pick a hart to run boot process using hart lottery.
This allows to exclude hart 0 from running the boot process.
(BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
allows to find out the correct destination for IPIs and remote sfence
calls.

Thanks to SiFive, Inc for the board provided.

Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20225


# ef68f03e 10-May-2019 Ruslan Bukin <br@FreeBSD.org>

RISC-V ISA does not specify how to manage physical memory attributes (PMA).
So do nothing in pmap_page_set_memattr() and don't panic.

Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20209


# 3b5b2029 05-Mar-2019 Mark Johnston <markj@FreeBSD.org>

Implement minidump support for RISC-V.

Submitted by: Mitchell Horne <mhorne063@gmail.com>
Differential Revision: https://reviews.freebsd.org/D18320


# 3a3dfb28 05-Mar-2019 Mark Johnston <markj@FreeBSD.org>

Initialize dump_avail[] on riscv.

Submitted by: Mitchell Horne <mhorne063@gmail.com>
Differential Revision: https://reviews.freebsd.org/D19170


# 91c3fda0 05-Mar-2019 Mark Johnston <markj@FreeBSD.org>

Add pmap_get_tables() for riscv.

This mirrors the arm64 implementation and is for use in the minidump
code.

Submitted by: Mitchell Horne <mhorne063@gmail.com>
Differential Revision: https://reviews.freebsd.org/D18321


# 35c91b0c 13-Feb-2019 Mark Johnston <markj@FreeBSD.org>

Implement per-CPU pmap activation tracking for RISC-V.

This reduces the overhead of TLB invalidations by ensuring that we
only interrupt CPUs which are using the given pmap. Tracking is
performed in pmap_activate(), which gets called during context switches:
from cpu_throw(), if a thread is exiting or an AP is starting, or
cpu_switch() for a regular context switch.

For now, pmap_sync_icache() still must interrupt all CPUs.

Reviewed by: kib (earlier version), jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18874


# 91c85dd8 13-Feb-2019 Mark Johnston <markj@FreeBSD.org>

Implement pmap_clear_modify() for RISC-V.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18875


# f6893f09 13-Feb-2019 Mark Johnston <markj@FreeBSD.org>

Implement transparent 2MB superpage promotion for RISC-V.

This includes support for pmap_enter(..., psind=1) as described in the
commit log message for r321378.

The changes are largely modelled after amd64. arm64 has more stringent
requirements around superpage creation to avoid the possibility of TLB
conflict aborts, and these requirements do not apply to RISC-V, which
like amd64 permits simultaneous caching of 4KB and 2MB translations for
a given page. RISC-V's PTE format includes only two software bits, and
as these are already consumed we do not have an analogue for amd64's
PG_PROMOTED. Instead, pmap_remove_l2() always invalidates the entire
2MB address range.

pmap_ts_referenced() is modified to clear PTE_A, now that we support
both hardware- and software-managed reference and dirty bits. Also
fix pmap_fault_fixup() so that it does not set PTE_A or PTE_D on kernel
mappings.

Reviewed by: kib (earlier version)
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18863
Differential Revision: https://reviews.freebsd.org/D18864
Differential Revision: https://reviews.freebsd.org/D18865
Differential Revision: https://reviews.freebsd.org/D18866
Differential Revision: https://reviews.freebsd.org/D18867
Differential Revision: https://reviews.freebsd.org/D18868


# 8fc2164b 28-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Remove a redundant test.

The existence of a PV entry for a mapping guarantees that the mapping
exists, so we should not need to test for that.

Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18866


# c1959ba4 04-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Fix dirty bit handling in pmap_remove_write().

Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18732


# b679dc7f 04-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Clear PGA_WRITEABLE in pmap_remove_pages().

Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18731


# 7c59ec14 03-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Fix a use-after-free in the riscv pmap_release() implementation.

Don't bother zeroing the top-level page before freeing it. Previously,
the page was freed before being zeroed.

Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18720


# bad66a29 03-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Synchronize access to the allpmaps list.

The list will be removed with some future work.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18721


# 60af3400 03-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Fix some issues with the riscv pmap_protect() implementation.

- Handle VM_PROT_EXECUTE.
- Clear PTE_D and mark the page dirty when removing write access
from a mapping.
- Atomically clear PTE_W to avoid clobbering a hardware PTE update.

Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18719


# 8ccaccd5 03-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Set PTE_U on PTEs created by pmap_enter_quick().

Otherwise prefaulted entries are not accessible from user mode and
end up triggering a fault upon access, so prefaulting has no effect.

Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18718


# 619999ff 03-Jan-2019 Mark Johnston <markj@FreeBSD.org>

Use regular stores to update PTEs in the riscv pmap layer.

There's no need to use atomics when the previous value isn't needed.
No functional change intended.

Reviewed by: kib
Discussed with: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18717


# 105c3171 14-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Avoid needless TLB invalidations in pmap_remove_pages().

pmap_remove_pages() is called during process termination, when it is
guaranteed that no other CPU may access the mappings being torn down.
In particular, it unnecessary to invalidate each mapping individually
since we do a pmap_invalidate_all() at the end of the function.

Also don't call pmap_invalidate_all() while holding a PV list lock, the
global pvh lock is sufficient.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18562


# 4f86ff4e 14-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Assume that pmap_l1() will return a PTE.

pmaps on RISC-V always have an L1 page table page, so we don't need to
check for this when performing lookups.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18563


# 4a020868 14-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Clean up the riscv pmap_bootstrap() implementation.

- Build up phys_avail[] in a single loop, excluding memory used by
the loaded kernel.
- Fix an array indexing bug in the aforementioned phys_avail[]
initialization.[1]
- Remove some unneeded code copied from the arm64 implementation.

PR: 231515 [1]
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18464


# a64886ce 10-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Remove an unused malloc(9) type.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# e7d46a1d 10-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Use inline tests for individual PTE bits in the RISC-V pmap.

Inline tests for PTE_* bits are easy to read and don't really require a
predicate function, and predicates which operate on a pt_entry_t are
inconvenient when working with L1 and L2 page table entries.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18461


# 1f5e341b 07-Dec-2018 Mark Johnston <markj@FreeBSD.org>

Rename sptbr to satp per v1.10 of the privileged architecture spec.

Add a subroutine for updating satp, for use when updating the
active pmap. No functional change intended.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18462


# ff9738d9 05-Nov-2018 John Baldwin <jhb@FreeBSD.org>

Rework setting PTE_D for kernel mappings.

Rather than unconditionally setting PTE_D for all writeable kernel
mappings, set PTE_D for writable mappings of unmanaged pages (whether
user or kernel). This matches what amd64 does and also matches what
the RISC-V spec suggests (preset the A and D bits on mappings where
the OS doesn't care about the state).

Suggested by: alc
Reviewed by: alc, markj
Sponsored by: DARPA


# d198cb6d 01-Nov-2018 John Baldwin <jhb@FreeBSD.org>

Restrict setting PTE execute permissions on RISC-V.

Previously, RISC-V was enabling execute permissions in PTEs for any
readable page. Now, execute permissions are only enabled if they were
explicitly specified (e.g. via PROT_EXEC to mmap). The one exception
is that the initial kernel mapping in locore still maps all of the
kernel RWX.

While here, change the fault type passed to vm_fault and
pmap_fault_fixup to only include a single VM_PROT_* value representing
the faulting access to match other architectures rather than passing a
bitmask.

Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17783


# 6f888020 01-Nov-2018 John Baldwin <jhb@FreeBSD.org>

Set PTE_A and PTE_D for user mappings in pmap_enter().

This assumes that an access according to the prot in 'flags' triggered
a fault and is going to be retried after the fault returns, so the two
flags are set preemptively to avoid refaulting on the retry.

While here, only bother setting PTE_D for kernel mappings in pmap_enter
for writable mappings.

Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17782


# b7b39193 25-Oct-2018 Ruslan Bukin <br@FreeBSD.org>

o Add pmap lock around pmap_fault_fixup() to ensure other thread will not
modify l3 pte after we loaded old value and before we stored new value.
o Preset A(accessed), D(dirty) bits for kernel mappings.

Reported by: kib
Reviewed by: markj
Discussed with: jhb
Sponsored by: DARPA, AFRL


# b977d819 18-Oct-2018 Ruslan Bukin <br@FreeBSD.org>

Support RISC-V implementations that do not manage the A and D bits
(e.g. RocketChip, lowRISC and derivatives).

RISC-V page table entries support A (accessed) and D (dirty) bits. The
spec makes hardware support for these bits optional. Implementations that
do not manage these bits in hardware raise page faults for accesses to a
valid page without A set and writes to a writable page without D set.
Check for these types of faults when handling a page fault and fixup the
PTE without calling vm_fault if they occur.

Reviewed by: jhb, markj
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17424


# 3c8efd61 18-Oct-2018 Ruslan Bukin <br@FreeBSD.org>

Revert r339421 due to unintended files included to commit.

Reported by: ian
Approved by: re (gjb)
Sponsored by: DARPA, AFRL


# 53c6ad1d 18-Oct-2018 Ruslan Bukin <br@FreeBSD.org>

Support RISC-V implementations that do not manage the A and D bits
(e.g. RocketChip, lowRISC and derivatives).

RISC-V page table entries support A (accessed) and D (dirty) bits. The
spec makes hardware support for these bits optional. Implementations that
do not manage these bits in hardware raise page faults for accesses to a
valid page without A set and writes to a writable page without D set.
Check for these types of faults when handling a page fault and fixup the
PTE without calling vm_fault if they occur.

Reviewed by: jhb, markj
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17424


# 94036a25 16-Oct-2018 Ruslan Bukin <br@FreeBSD.org>

Invalidate TLB on a local hart.

This was missed in r339367 ("Various fixes for TLB management on RISC-V.").

This fixes operation on lowRISC.

Reviewed by: jhb
Approved by: re (gjb)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17583


# 73efa2fb 15-Oct-2018 John Baldwin <jhb@FreeBSD.org>

Various fixes for TLB management on RISC-V.

- Remove the arm64-specific cpu_*cache* and cpu_tlb_flush* functions.
Instead, add RISC-V specific inline functions in cpufunc.h for the
fence.i and sfence.vma instructions.
- Catch up to changes in the arm64 pmap and remove all the cpu_dcache_*
calls, pmap_is_current, pmap_l3_valid_cacheable, and PTE_NEXT bits from
pmap.
- Remove references to the unimplemented riscv_setttb().
- Remove unused cpu_nullop.
- Add a link to the SBI doc to sbi.h.
- Add support for a 4th argument in SBI calls. It's not documented but
it seems implied for the asid argument to SBI_REMOVE_SFENCE_VMA_ASID.
- Pass the arguments from sbi_remote_sfence*() to the SEE. BBL ignores
them so this is just cosmetic.
- Flush icaches on other CPUs when they resume from kdb in case the
debugger wrote any breakpoints while the CPUs were paused in the IPI_STOP
handler.
- Add SMP vs UP versions of pmap_invalidate_* similar to amd64. The
UP versions just use simple fences. The SMP versions use the
sbi_remove_sfence*() functions to perform TLB shootdowns. Since we
don't have a valid pm_active field in the riscv pmap, just IPI all
CPUs for all invalidations for now.
- Remove an extraneous TLB flush from the end of pmap_bootstrap().
- Don't do a TLB flush when writing new mappings in pmap_enter(), only if
modifying an existing mapping. Note that for COW faults a TLB flush is
only performed after explicitly clearing the old mapping as is done in
other pmaps.
- Sync the i-cache on all harts before updating the PTE for executable
mappings in pmap_enter and pmap_enter_quick. Previously the i-cache was
only sync'd after updating the PTE in pmap_enter.
- Use sbi_remote_fence() instead of smp_rendezvous in pmap_sync_icache().

Reviewed by: markj
Approved by: re (gjb, kib)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17414


# 7a102e04 24-Sep-2018 John Baldwin <jhb@FreeBSD.org>

Implement pmap_sync_icache().

This invokes "fence" on the hart performing the write followed by an IPI
to execute "fence.i" on all harts.

This is required to support userland debuggers setting breakpoints in
user processes.

Reviewed by: br (earlier version), markj
Approved by: re (gjb)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D17139


# f0165b1c 28-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.

Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by: royger
Reviewed by: alc, markj (previous version)
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Approved by: re (marius)
Differential revision: https://reviews.freebsd.org/D16881


# e45b89d2 01-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Add pmap_is_valid_memattr(9).

Discussed with: alc
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15583


# a3055a5e 26-Jul-2018 Mark Johnston <markj@FreeBSD.org>

Implement pmap_mincore() for riscv.

Reviewed by: alc, br
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16444


# afeed44d 14-Jul-2018 Alan Cox <alc@FreeBSD.org>

Invalidate the mapping before updating its physical address.

Doing so ensures that all threads sharing the pmap have a consistent
view of the mapping. This fixes the problem described in the commit
log message for r329254 without the overhead of an extra page fault
in the common case. (Now that all pmap_enter() implementations are
similarly modified, the workaround added in r329254 can be removed,
reducing the overhead of COW faults.)

With this change we can reuse the PV entry from the old mapping,
potentially avoiding a call to reclaim_pv_chunk(). Otherwise, there is
nothing preventing the old PV entry from being reclaimed. In rare
cases this could result in the PTE's page table page being freed,
leading to a use-after-free of the page when the updated PTE is written
following the allocation of the PV entry for the new mapping.

Reviewed by: br, markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D16261


# 8c8ee2ee 04-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Unify bulk free operations in several pmaps.

Submitted by: Yoshihiro Ota
Reviewed by: markj
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D13485


# 2c0f13aa 20-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

vm_wait() rework.

Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384


# e958ad4c 12-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273


# ab7c09f1 08-Feb-2018 Mark Johnston <markj@FreeBSD.org>

Use vm_page_unwire_noq() instead of directly modifying page wire counts.

No functional change intended.

Reviewed by: alc, kib (previous revision)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14266


# fb3cc1c3 07-Dec-2017 Bruce Evans <bde@FreeBSD.org>

Move instantiation of msgbufp from 9 MD files to subr_prf.c.

This variable should be pure MI except possibly for reading it in MD
dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998. There were many imperfections in
r36441. This commit fixes only a small one, to simplify fixing the
others 1 arch at a time. (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)


# 0d4435df 22-Nov-2017 Ruslan Bukin <br@FreeBSD.org>

o Invalidate the correct page in pmap_protect().
With this bug fix we don't need to invalidate all the entries.
o Remove a call to pmap_invalidate_all(). This was never called
as the anyvalid variable is never set.

Obtained from: arm64/pmap.c (r322797, r322800)
Sponsored by: DARPA, AFRL


# df57947f 18-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

spdx: initial adoption of licensing ID tags.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133


# 2582d7a9 19-Sep-2017 Alan Cox <alc@FreeBSD.org>

Sync with amd64/arm/arm64/i386/mips pmap change r288256:

Exploit r288122 to address a cosmetic issue. Since PV chunk pages don't
belong to a vm object, they can't be paged out. Since they can't be paged
out, they are never enqueued in a paging queue. Nonetheless, passing
PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages
are being enqueued in the inactive queue. As of r288122, we can avoid
this false impression by passing PQ_NONE.

MFC after: 1 week


# af19cc59 10-Aug-2017 Ruslan Bukin <br@FreeBSD.org>

Support for v1.10 (latest) of RISC-V privilege specification.

New version is not compatible on supervisor mode with v1.9.1
(previous version).

Highlights:
o BBL (Berkeley Boot Loader) provides no initial page tables
anymore allowing us to choose VM, to build page tables manually
and enable MMU in S-mode.
o SBI interface changed.
o GENERIC kernel.
FDT is now chosen standard for RISC-V hardware description.
DTB is now provided by Spike (golden model simulator). This
allows us to introduce GENERIC kernel. However, description
for console and timer devices is not provided in DTB, so move
these devices temporary to nexus bus.
o Supervisor can't access userspace by default. Solution is to
set SUM (permit Supervisor User Memory access) bit in sstatus
register.
o Compressed extension is now turned on by default.
o External GCC 7.1 compiler used.
o _gp renamed to __global_pointer$
o Compiler -march= string is now in use allowing us to choose
required extensions (compressed, FPU, atomic, etc).

Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11800


# 71e35b7d 14-Nov-2016 Ruslan Bukin <br@FreeBSD.org>

Check if L2 entry exists for the given VA before loading L3 entry.

This is a fix for a panic that was easy to reproduce executing
"(/bin/ls &)" in the shell.

Sponsored by: DARPA, AFRL


# 8cb0c102 10-Sep-2016 Alan Cox <alc@FreeBSD.org>

Various changes to pmap_ts_referenced()

Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.

Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.

On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.

Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7836


# dbbaf04f 03-Sep-2016 Mark Johnston <markj@FreeBSD.org>

Remove support for idle page zeroing.

Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714


# 5f8228b2 09-Aug-2016 Ruslan Bukin <br@FreeBSD.org>

o Remove operation in machine mode.
Machine privilege level was specially designed to use in vendor's
firmware or bootloader. We have implemented operation in machine
mode in FreeBSD as part of understanding RISC-V ISA, but it is time
to remove it.
We now use BBL (Berkeley Boot Loader) -- standard RISC-V firmware,
which provides operation in machine mode for us.
We now use standard SBI calls to machine mode, instead of handmade
'syscalls'.
o Remove HTIF bus.
HTIF bus is now legacy and no longer exists in RISC-V specification.
HTIF code still exists in Spike simulator, but BBL do not provide
raw interface to it.
Memory disk is only choice for now to have multiuser booted in Spike,
until Spike has implemented more devices (e.g. Virtio, etc).

Sponsored by: DARPA, AFRL
Sponsored by: HEIF5


# 98f50c44 02-Aug-2016 Ruslan Bukin <br@FreeBSD.org>

Update RISC-V port to Privileged Architecture Version 1.9.

Sponsored by: DARPA, AFRL
Sponsored by: HEIF5


# d3ffaee8 13-May-2016 Alan Cox <alc@FreeBSD.org>

Eliminate an unused #include. For a brief period of time, _unrhdr.h was
used to implement PCID support on amd64.

Reviewed by: kib


# 9af94226 26-Apr-2016 Ruslan Bukin <br@FreeBSD.org>

Rework the list of all pmaps: embed the list link into pmap.


# 02a37128 25-Apr-2016 Ruslan Bukin <br@FreeBSD.org>

o Implement shared pagetables and switch from 4 to 3 levels page
memory system.

RISC-V ISA has only single page table base register for both kernel
and user addresses translation. Before this commit we were using an
extra (4th) level of pagetables for switching between kernel and user
pagetables, but then realized FPGA hardware has 3-level page system
hardcoded. It is also become clear that the bitfile synthesized for
4-level system is untested/broken, so we can't use extra level for
switching.

We are now share level 1 of pagetables between kernel and user VA.
This requires to keep track of all the user pmaps created and once we
adding L1 page to kernel pmap we have to add it to all the user pmaps.

o Change the VM layout as we must have topmost bit to be 1 in the
selected page system for kernel addresses and 0 for user addresses.
o Implement pmap_kenter_device().
o Create the l3 tables for the early devmap.

Sponsored by: DARPA, AFRL
Sponsored by: HEIF5


# 14232d42 26-Feb-2016 Ruslan Bukin <br@FreeBSD.org>

o Use uint64_t for page number as it doesn't fit uint32_t.
o Implement growkernel bits for L1 level of pagetables.

This allows us to boot with 128GB of physical memory.

Sponsored by: DARPA, AFRL
Sponsored by: HEIF5


# 17696c12 24-Feb-2016 Ruslan Bukin <br@FreeBSD.org>

Add support for symmetric multiprocessing (SMP).

Tested on Spike simulator with 2 and 16 cores (tlb enabled),
so set MAXCPU to 16 at this time.

This uses FDT data to get information about CPUs
(code based on arm64 mp_machdep).

Invalidate entire TLB cache as it is the only way yet.

Sponsored by: DARPA, AFRL
Sponsored by: HEIF5


# 229f3f0d 18-Feb-2016 Ruslan Bukin <br@FreeBSD.org>

Increase kernel and user VA space.
This allows us to boot with more than 128MB of physical memory.

Sponsored by: DARPA, AFRL
Sponsored by: HEIF5


# 28029b68 29-Jan-2016 Ruslan Bukin <br@FreeBSD.org>

Welcome the RISC-V 64-bit kernel.

This is the final step required allowing to compile and to run RISC-V
kernel and userland from HEAD.

RISC-V is a completely open ISA that is freely available to academia
and industry.

Thanks to all the people involved! Special thanks to Andrew Turner,
David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and
Arun Thomas for their help.
Thanks to Robert Watson for organizing this project.

This project sponsored by UK Higher Education Innovation Fund (HEIF5) and
DARPA CTSRD project at the University of Cambridge Computer Laboratory.

FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv

Reviewed by: andrew, emaste, kib
Relnotes: Yes
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D4982